Async Usage Parsing Workflows

Async usage parsing workflows are the non-blocking ingestion tier that turns rate-limited provider APIs, engine telemetry, and billing exports into deterministic, chargeback-ready cost records without stalling on network latency.

Back to: Metric Extraction & Aggregation Pipelines

This dimension of Metric Extraction & Aggregation Pipelines covers how Cloud DBA and FinOps teams fetch usage concurrently, normalize it against a canonical schema, enforce quotas from the parsed result, and recover cleanly when a provider throttles or a payload drifts. Synchronous polling loops that were adequate at ten databases collapse at a thousand: sequential pagination against AWS Cost Explorer, Azure Cost Management, and per-engine metering views serializes latency you cannot afford at month-end. Structured concurrency solves this — but only if backpressure, validation, and idempotency are designed in from the first coroutine rather than bolted on after the first ThrottlingException.

The diagram below traces a usage record from concurrent rate-limited fetches through parsing, validation, and aggregation.

Billing Model & Attribution Challenges

The core difficulty is that no two consumption sources agree on granularity, currency, or time semantics. AWS Cost Explorer returns amortized daily line items grouped by LINKED_ACCOUNT, SERVICE, or tag key; Azure Cost Management emits actual and amortized cost rows on independent schedules; and engine-level metering — the WAREHOUSE_METERING_HISTORY and QUERY_HISTORY views on Snowflake, or pg_stat_statements on PostgreSQL — reports credits and execution time that must be priced after the fact. An async parser has to reconcile all three into one row shape without letting the fastest source overwrite the slowest.

Blended versus disaggregated billing is the first trap. Blended rates average the cost of a resource across an organization, so a coroutine that reads a blended figure and attributes it to a single tenant will systematically misprice reserved-instance and Savings Plan coverage. Parsers should request UnblendedCost (and AmortizedCost for reservation-aware attribution) explicitly, and the split between the two mirrors the same decisions covered in how compute and storage costs break down per resource. The second trap is temporal skew: Cost Explorer data finalizes up to 24–48 hours after usage, while metering views are near-real-time, so a naive join keyed only on date double-counts or drops records. The parser must carry an ingestion_watermark and treat late-arriving cost as an idempotent upsert, not an append.

Currency and unit normalization is where most silent corruption enters. A record priced in USD per GB-month cannot be summed with one priced in credits per hour until both are resolved to a unit cost against a canonical resource model. This is the same alignment problem solved at the account boundary when normalizing provider billing exports into a unified schema; the async layer simply enforces it earlier, at ingestion, so no downstream aggregation ever sees a mixed-unit row. The canonical usage record the parser emits is a flat, tag-enriched row keyed by tenant_id, resource_id, usage_type, usage_unit, quantity, unit_cost, cost_center, and an event timestamp.

Attribution of an aggregated cost center total from these rows is deterministic:

C_{cc} = \sum_{i \in cc} q_i \cdot u_i

where $q_i$ is the parsed quantity and $u_i$ the resolved per-unit cost for each usage record i mapped to cost center cc. Because the sum is exact, any variance between $C_{cc}$ and the provider invoice is a data-quality signal — a missing tag, a dropped page, or a dead-lettered payload — not rounding noise.

Telemetry Extraction & Metric Normalization

Extraction is an I/O-bound fan-out: for every account, region, resource group, or shard, the parser dispatches an independent fetch and lets the event loop overlap the wait. The governing constraint is not CPU but the provider’s transactions-per-second ceiling, so concurrency must be bounded rather than unbounded. An asyncio.Semaphore sized to the API quota is the deterministic governor; task groups own the lifetimes so a single failure cancels siblings cleanly instead of leaking connections.

import asyncio
import aioboto3

async def fetch_cost_page(session, semaphore, account_id, start, end, token=None):
    """Fetch one Cost Explorer page for an account under the concurrency gate."""
    params = {
        "TimePeriod": {"Start": start, "End": end},
        "Granularity": "DAILY",
        "Metrics": ["UnblendedCost", "AmortizedCost", "UsageQuantity"],
        "GroupBy": [
            {"Type": "DIMENSION", "Key": "LINKED_ACCOUNT"},
            {"Type": "TAG", "Key": "cost_center"},
        ],
    }
    if token:
        params["NextPageToken"] = token

    async with semaphore:  # never exceed the provisioned TPS ceiling
        async with session.client("ce", region_name="us-east-1") as ce:
            return await ce.get_cost_and_usage(**params)


async def fetch_account(session, semaphore, account_id, start, end):
    """Follow pagination for a single account, yielding every result group."""
    token = None
    while True:
        page = await fetch_cost_page(session, semaphore, account_id, start, end, token)
        for group in page["ResultsByTime"]:
            yield account_id, group
        token = page.get("NextPageToken")
        if not token:
            break

Pagination is the schema-drift surface most parsers get wrong. Cost Explorer returns NextPageToken; the boto3 paginate helpers hide it for synchronous callers, but aioboto3 exposes it directly, so the loop above must treat an absent token — not an empty page — as the terminal condition. Grouping by a tag that does not exist on some resources yields groups whose Keys array is shorter than expected, so key access must be positional-safe. Extracting the same signals from database internals rather than the billing API — isolating billable tenant compute from autovacuum and background workers — follows the system view querying patterns documented alongside this workflow.

Normalization runs immediately after each page arrives, while the payload is still in memory, so malformed rows never reach the aggregator. The translation maps provider-specific identifiers to the canonical model and resolves units; validating that every emitted row satisfies the contract is the responsibility of the schema validation rules applied at the billing-data boundary. Using pydantic with model_validate gives fail-fast rejection and a structured error to route downstream:

from datetime import datetime
from decimal import Decimal
from pydantic import BaseModel, Field, ValidationError

class UsageRecord(BaseModel):
    tenant_id: str
    resource_id: str
    usage_type: str
    usage_unit: str
    quantity: Decimal = Field(ge=0)
    unit_cost: Decimal = Field(ge=0)
    cost_center: str
    timestamp: datetime

def normalize_group(account_id, group):
    """Map one Cost Explorer result group to a canonical UsageRecord."""
    keys = group.get("Keys", [])
    cost_center = keys[1].split("$", 1)[-1] if len(keys) > 1 else "untagged"
    metrics = group["Metrics"]
    quantity = Decimal(metrics["UsageQuantity"]["Amount"])
    amount = Decimal(metrics["UnblendedCost"]["Amount"])
    unit_cost = (amount / quantity) if quantity else Decimal(0)
    return UsageRecord(
        tenant_id=account_id,
        resource_id=keys[0] if keys else account_id,
        usage_type="compute",
        usage_unit=metrics["UsageQuantity"]["Unit"],
        quantity=quantity,
        unit_cost=unit_cost,
        cost_center=cost_center,
        timestamp=datetime.fromisoformat(group["TimePeriod"]["Start"]),
    )

Using Decimal rather than float is not optional for financial data — binary floating point silently corrupts summed cent-level costs across thousands of rows, and the resulting variance is indistinguishable from a real attribution error during reconciliation.

Python Automation Patterns

The idiomatic structure is a bounded producer/consumer built on asyncio.TaskGroup (Python 3.11+): producers fan out per-account fetches through the semaphore, each result is normalized and validated in flight, and valid records stream into an aggregation buffer while rejects go to a dead-letter sink. The concurrency ceiling is derived, not guessed. Little’s Law gives the number of in-flight requests needed to saturate a target throughput without exceeding the quota:

N_{\max} = \left\lceil \lambda \cdot \bar{t} \right\rceil

where $\lambda$ is the target requests-per-second the provider allows and $\bar{t}$ is the mean request latency in seconds. Sizing the semaphore to $N_{\max}$ keeps the pool saturated without tipping into throttling.

import asyncio
from collections import defaultdict

async def run_pipeline(account_ids, start, end, max_concurrency=8):
    """Fan out per-account fetches, validate in flight, aggregate by cost center."""
    semaphore = asyncio.Semaphore(max_concurrency)
    session = aioboto3.Session()
    totals = defaultdict(Decimal)
    dead_letters = []

    async def worker(account_id):
        async for acct, group in fetch_account(session, semaphore, account_id, start, end):
            try:
                record = normalize_group(acct, group)
            except (ValidationError, KeyError, ArithmeticError) as exc:
                dead_letters.append({"account": acct, "raw": group, "error": str(exc)})
                continue
            totals[record.cost_center] += record.quantity * record.unit_cost

    async with asyncio.TaskGroup() as tg:
        for account_id in account_ids:
            tg.create_task(worker(account_id))

    return dict(totals), dead_letters

Retries belong on the individual fetch, not the whole pipeline, so a transient throttle on one account never restarts work that already succeeded elsewhere. A tenacity-decorated coroutine with exponential backoff and jitter keeps retry storms from synchronizing across workers:

from tenacity import (
    retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception_type,
)
from botocore.exceptions import ClientError

def is_throttle(exc):
    return (
        isinstance(exc, ClientError)
        and exc.response["Error"]["Code"] in {"ThrottlingException", "TooManyRequestsException"}
    )

@retry(
    retry=retry_if_exception_type(ClientError),
    wait=wait_exponential_jitter(initial=1, max=30),
    stop=stop_after_attempt(6),
    reraise=True,
)
async def fetch_cost_page_resilient(*args, **kwargs):
    try:
        return await fetch_cost_page(*args, **kwargs)
    except ClientError as exc:
        if not is_throttle(exc):
            raise  # non-throttle errors should not be retried blindly
        raise

For teams standardizing this end to end, the full reference — connection reuse, pagination-token management, and cost-dimension mapping — is worked through in building async Python parsers for AWS Cost Explorer. The equivalent HTTP-level patterns for non-boto providers use httpx.AsyncClient with limits=httpx.Limits(max_connections=N, max_keepalive_connections=N) and the same semaphore discipline, so the same worker shape ports directly to Azure Cost Management or a database provider’s REST metering endpoint.

Quota Enforcement Integration

Parsed cost is only useful if it drives a decision. The aggregated totals dictionary produced above is the input to quota evaluation: each cost_center total is compared against its configured soft and hard boundaries, and the parser emits the breach events that downstream enforcement acts on. Translating those normalized cost signals into hard and soft limits is governed by database quota boundary design, which defines what a soft-warning threshold and a hard-stop ceiling mean per tenant.

The routing of a parsed total depends on the operational SLA. Sub-minute quota decisions — cutting off a runaway tenant before spend accrues — consume the parser’s output through the real-time metric streaming setup, where each validated record is published to a stream the enforcement service tails. Month-end reconciliation and chargeback, by contrast, feed the same records into batch processing for historical metrics, where completeness matters more than latency and late-arriving cost is folded in via idempotent upserts.

def evaluate_quotas(totals, policies):
    """Yield breach events from parsed cost-center totals.
    `policies` maps cost_center -> {"soft": Decimal, "hard": Decimal}."""
    for cost_center, spend in totals.items():
        policy = policies.get(cost_center)
        if not policy:
            continue
        if spend >= policy["hard"]:
            yield {"cost_center": cost_center, "level": "hard", "spend": spend}
        elif spend >= policy["soft"]:
            yield {"cost_center": cost_center, "level": "soft", "spend": spend}

Because enforcement can suspend real workloads, the parser must guarantee that a breach event is never fired from incomplete data. A total computed from a run where any account dead-lettered a page is partial and must be marked as such, so the enforcement layer can choose to warn rather than hard-stop on an unreliable figure. Carrying a complete: bool flag alongside each cost-center total, derived from whether that account’s pagination finished cleanly, is the difference between a defensible quota action and an outage caused by a dropped page.

Failure Modes & Troubleshooting

ThrottlingException / TooManyRequestsException. The most common failure at scale, triggered when concurrency exceeds the provider ceiling. The fix is to lower the semaphore size toward $N_{\max}$, honor the Retry-After header when present, and add jitter so retries do not resynchronize. Deeper treatment of adaptive backpressure, token buckets, and circuit breakers lives in handling rate limits when pulling database metrics.

Schema mismatch / short Keys array. Grouping by a tag absent on some resources returns groups with fewer keys than expected, raising IndexError or silently attributing cost to the wrong center. Always access group keys positionally with a length guard, default missing tags to an explicit untagged bucket, and let pydantic reject any record whose cost_center fails to resolve rather than writing a null.

Missing or delayed tags. Cost-allocation tags propagate asynchronously and can lag activation by up to 24 hours, so freshly launched resources appear untagged in Cost Explorer before catching up. Treat untagged as a first-class cost center, alert when its share crosses a threshold, and re-run affected windows once tags settle rather than back-filling by hand.

Event-loop starvation from blocking calls. A synchronous boto3 client, a blocking psycopg2 query, or a CPU-bound normalization inside a coroutine blocks the entire loop and collapses throughput to serial. Keep the hot path fully async (aioboto3, asyncpg, httpx) and push any unavoidable blocking work to asyncio.to_thread.

Connection exhaustion and file-descriptor leaks. Unbounded fan-out or clients created per request exhaust the descriptor table and surface as intermittent connection resets. Reuse a single session, cap connections with the semaphore and client limits, and scope every client with async with so TaskGroup cancellation releases them deterministically.

Non-idempotent late-arriving cost. Because provider cost finalizes over 24–48 hours, an append-only writer double-counts on re-run. Key every write on (tenant_id, resource_id, usage_type, timestamp) and upsert, so re-processing a window converges instead of inflating totals. When a fetch fails outright and its cost cannot be resolved, the graceful-degradation and cached-fallback behavior that keeps the pipeline serving stale-but-valid figures is covered by fallback routing for cost APIs, and the structured recovery of partial runs by error handling in cost pipelines.

Building async Python parsers for AWS Cost Explorer — the full paginated, semaphore-governed reference parser for Cost Explorer.
Handling rate limits when pulling database metrics — adaptive backpressure, token buckets, and circuit breakers for provider throttling.
System View Querying Patterns — extract billable tenant compute from pg_stat_activity, Oracle V$SESSION, and Snowflake metering views.
Schema Validation for Billing Data — enforce the canonical usage-record contract at the ingestion boundary.
Error Handling in Cost Pipelines — recover partial runs, dead-letter malformed payloads, and keep financial reporting intact.

Back to: Metric Extraction & Aggregation Pipelines

Async Usage Parsing Workflows #

Billing Model & Attribution Challenges #

Telemetry Extraction & Metric Normalization #

Python Automation Patterns #

Quota Enforcement Integration #

Failure Modes & Troubleshooting #

Related #

Explore this section