Building async Python parsers for AWS Cost Explorer

This page walks through the exact Python needed to turn AWS Cost Explorer’s rate-limited, deeply nested GetCostAndUsage responses into deterministic, typed cost records — fetched concurrently, validated on the way in, and safe to feed straight into database chargeback and quota jobs.

Back to: Async Usage Parsing Workflows

At enterprise footprint sizes, the default synchronous boto3 Cost Explorer client is the wrong tool: correlating compute, storage, and I/O spend across hundreds of RDS, Aurora, and DynamoDB resources serializes network latency you cannot afford at month-end, and uncoordinated polling trips ThrottlingException almost immediately. The fix is the same async semaphore-controlled concurrency that governs every ingestion tier in these Metric Extraction & Aggregation Pipelines: bounded fan-out with aioboto3, a deterministic rate governor, capped backoff, and strict schema validation for billing payloads so a malformed group never poisons a downstream quota calculation. This guide builds that parser end to end.

Prerequisites

Confirm the following are in place before running the parser.

IAM permissions: the execution role needs read-only Cost Explorer access. Scope it to least privilege — never reuse an admin credential for read-only extraction, in line with broader access control for cost data.
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AsyncCostExplorerRead",
      "Effect": "Allow",
      "Action": [
        "ce:GetCostAndUsage",
        "ce:GetCostAndUsageWithResources"
      ],
      "Resource": "*"
    }
  ]
}
```
Cost Explorer actions do not support resource-level ARNs, so Resource stays "*"; constrain access with an IAM condition or an SCP restricting the account instead.
Cost Explorer enabled: activate Cost Explorer in the billing console at least 24 hours before the first run, and enable any cost allocation tags you intend to group by.
Python: 3.11 or newer (the orchestration uses asyncio.TaskGroup).
Libraries: install the async AWS client and Pydantic v2.
```
pip install "aioboto3>=13.0" "pydantic>=2.6"
```

Step-by-Step Implementation

The parser defines a strict record contract, gates concurrency with a semaphore, retries throttled calls with capped backoff, validates every group as it is parsed, and fans out across regions with structured concurrency. Each aggregated cost-center total is then an exact sum $C_{g} = \sum_{i \in g} a_i$ over the validated per-group amounts $a_i$, so any variance against the invoice is a data-quality signal, not rounding noise.

Step 1 — Define the validated record contract

Model the output shape with Pydantic v2 first. Strict typing guarantees that malformed or truncated API responses fail fast at the boundary instead of propagating None into quota math — the same discipline covered in validating JSON billing payloads with Pydantic.

from datetime import datetime
from typing import Dict, List

from pydantic import BaseModel, Field


class CostMetric(BaseModel):
    dimension: str          # e.g. "UnblendedCost"
    amount: float
    unit: str               # e.g. "USD"


class CostRecord(BaseModel):
    time_start: datetime
    time_end: datetime
    group_key: str          # the GroupBy key value, e.g. a linked account id
    metrics: List[CostMetric]
    tags: Dict[str, str] = Field(default_factory=dict)

Step 2 — Gate concurrency and back off on throttling

An asyncio.Semaphore sized to your provisioned Cost Explorer transactions-per-second is the deterministic governor: concurrent coroutines multiplex I/O through aioboto3 but can never exceed the API quota. The helper below runs any request factory under that gate and retries only on throttling, sleeping a capped exponential delay and yielding the loop to siblings. Non-throttling errors re-raise immediately — the same retry contract used across error handling in cost pipelines.

import asyncio
import logging

from botocore.exceptions import ClientError

logger = logging.getLogger("finops.ce_parser")


async def execute_with_backoff(
    coro_factory,
    semaphore: asyncio.Semaphore,
    max_retries: int = 3,
    base_delay: float = 1.5,
):
    """Run coro_factory() under the concurrency gate, retrying on throttling."""
    for attempt in range(max_retries):
        try:
            async with semaphore:
                return await coro_factory()
        except ClientError as exc:
            code = exc.response["Error"]["Code"]
            if code in ("Throttling", "ThrottlingException"):
                delay = min(base_delay * (2 ** attempt), 30)
                logger.warning(
                    "Throttled on attempt %d; backing off %.2fs", attempt + 1, delay
                )
                await asyncio.sleep(delay)
            else:
                raise
    raise RuntimeError(f"Max retries ({max_retries}) exceeded for Cost Explorer query")

Step 3 — Fetch and validate into typed records

Open the async ce client, issue get_cost_and_usage through the backoff helper, then validate each group as a CostRecord. A ValidationError on one malformed group is logged and skipped — it never aborts the batch or leaks a partial row downstream. This is the same discipline covered in validating JSON billing payloads with Pydantic.

from typing import List, Optional

import aioboto3
from pydantic import ValidationError


async def fetch_cost_data(
    session: aioboto3.Session,
    semaphore: asyncio.Semaphore,
    start_date: str,
    end_date: str,
    region: str = "us-east-1",      # Cost Explorer is global via us-east-1
    granularity: str = "DAILY",
    group_by: Optional[List[Dict[str, str]]] = None,
) -> List[CostRecord]:
    query = {
        "TimePeriod": {"Start": start_date, "End": end_date},
        "Granularity": granularity,
        "Metrics": ["UnblendedCost"],
    }
    if group_by:
        query["GroupBy"] = group_by

    async with session.client("ce", region_name=region) as ce:
        response = await execute_with_backoff(
            lambda: ce.get_cost_and_usage(**query), semaphore
        )

    records: List[CostRecord] = []
    for result in response.get("ResultsByTime", []):
        period = result["TimePeriod"]
        for group in result.get("Groups", []):
            try:
                metric = group["Metrics"]["UnblendedCost"]
                records.append(
                    CostRecord(
                        time_start=datetime.fromisoformat(period["Start"]),
                        time_end=datetime.fromisoformat(period["End"]),
                        group_key=group["Keys"][0] if group.get("Keys") else "ungrouped",
                        metrics=[
                            CostMetric(
                                dimension="UnblendedCost",
                                amount=float(metric["Amount"]),
                                unit=metric["Unit"],
                            )
                        ],
                        tags=dict(group.get("Tags", {})),
                    )
                )
            except (ValidationError, KeyError, ValueError) as exc:
                logger.error("Skipping malformed CE group: %s", exc)
                continue
    return records

Step 4 — Fan out across regions with structured concurrency

asyncio.TaskGroup (Python 3.11+) owns the lifetimes of every regional fetch: if one raises, siblings are cancelled cleanly instead of leaking connections. A single shared aioboto3.Session and one Semaphore per region keep concurrency bounded even as the fan-out scales.

import asyncio

import aioboto3


async def collect_all_regions(
    regions: List[str],
    start_date: str,
    end_date: str,
    max_concurrency: int = 5,
) -> List[CostRecord]:
    """Fetch service-grouped daily cost across regions concurrently."""
    session = aioboto3.Session()
    group_by = [{"Type": "DIMENSION", "Key": "SERVICE"}]
    results: List[CostRecord] = []

    async with asyncio.TaskGroup() as tg:
        tasks = [
            tg.create_task(
                fetch_cost_data(
                    session,
                    asyncio.Semaphore(max_concurrency),
                    start_date,
                    end_date,
                    region=region,
                    group_by=group_by,
                )
            )
            for region in regions
        ]
    for task in tasks:
        results.extend(task.result())
    return results


if __name__ == "__main__":
    rows = asyncio.run(
        collect_all_regions(["us-east-1"], "2026-06-01", "2026-06-08")
    )
    for row in rows[:3]:
        print(row.model_dump())

The sequence below traces a single fetch as it passes the semaphore gate, retries on throttling with capped backoff, validates each result, and excludes malformed payloads.

Verification

Confirm the parser returns trustworthy, typed records before wiring it into any attribution job.

Assert the record shape. Every returned object is a CostRecord with a non-negative amount and a resolved currency.

rows = asyncio.run(collect_all_regions(["us-east-1"], "2026-06-01", "2026-06-08"))
for r in rows:
    assert isinstance(r, CostRecord)
    assert r.metrics[0].amount >= 0.0
    assert r.metrics[0].unit == "USD"
print(f"validated {len(rows)} cost records")

Expected shape for a service-grouped daily run:

{'time_start': datetime.datetime(2026, 6, 1, 0, 0), 'time_end': datetime.datetime(2026, 6, 2, 0, 0), 'group_key': 'Amazon Relational Database Service', 'metrics': [{'dimension': 'UnblendedCost', 'amount': 214.83, 'unit': 'USD'}], 'tags': {}}
{'time_start': datetime.datetime(2026, 6, 1, 0, 0), 'time_end': datetime.datetime(2026, 6, 2, 0, 0), 'group_key': 'Amazon DynamoDB', 'metrics': [{'dimension': 'UnblendedCost', 'amount': 47.12, 'unit': 'USD'}], 'tags': {}}

Reconcile against the console. In Cost Management > Cost Explorer, group by Service over the same window. Summing each group_key’s amounts must match the console totals within rounding.
Force a throttle in a scratch account. Drop max_concurrency far above your quota and confirm the logs show Throttled on attempt N; backing off … rather than an uncaught ClientError.

Gotchas & Edge Cases

GetCostAndUsage is paginated. Large groupings return a NextPageToken; the parser above reads only the first page. For anything spanning many accounts or a wide time range, loop on the token — this is exactly where batch processing for historical metrics takes over, chunking windows and preserving ordering.
Blended vs unblended matters. Requesting BlendedCost averages reserved-instance and Savings Plan coverage across the org, so a single-tenant attribution built on it is systematically wrong. Use UnblendedCost, and add AmortizedCost when you need reservation-aware chargeback.
ResultsByTime has no Groups when ungrouped. A query without GroupBy returns totals under a Total key, not Groups, so the group loop yields nothing. Branch on whether GroupBy was supplied.
Cost Explorer data is not real-time. Line items finalize 24–48 hours after usage and can restate. Treat a fresh window as provisional and re-pull it, rather than appending — for sub-minute signals reach for real-time metric streaming against CloudWatch instead of Cost Explorer.
Tag propagation lag. A newly activated cost allocation tag can take up to 24 hours to appear in grouped results, so tags may be empty on recent records even when the resource is tagged.
The API itself costs money. Each paginated GetCostAndUsage request is billable. Aggressive fan-out without the semaphore gate inflates both your throttle rate and your Cost Explorer bill.

Frequently Asked Questions

Why use aioboto3 instead of running boto3 in a thread pool?

A thread pool works but caps out fast: each blocking boto3 call holds a thread for the full round-trip, so hundreds of concurrent regional or account fetches mean hundreds of threads and heavy context-switching. aioboto3 multiplexes those waits on a single event loop, and the asyncio.Semaphore gives you a precise, deterministic concurrency ceiling that maps directly to your API quota — something a thread pool’s worker count only approximates.

What concurrency value should the semaphore use?

Start at 5 and raise it only while watching your throttle rate. Cost Explorer’s per-account request rate is low and undocumented as a hard number, so the semaphore should sit at or just below the point where ThrottlingException becomes routine. Because backoff retries are capped, a slightly-too-high value degrades to slower throughput rather than failure, but a much-too-high value wastes billable requests on retries.

How do I add cost allocation tags to the grouping?

Pass a tag group in group_by, e.g. [{"Type": "TAG", "Key": "cost_center"}]. The tag must be activated as a cost allocation tag in the billing console first, and there is up to a 24-hour lag before it populates. You can combine a DIMENSION and a TAG group in a single query (Cost Explorer allows up to two GroupBy entries) to attribute per service and per cost center at once.

Why is a group being silently dropped from my results?

The parser skips any group that fails CostRecord validation — most often a missing UnblendedCost metric or a non-numeric Amount. The skip is logged at error level with the underlying exception, so check the logs before assuming data loss. If drops are frequent, the payload contract has drifted and the Pydantic model needs updating rather than the record discarding.

Can I reuse one parser instance across many queries?

Yes, and you should. The semaphore and aioboto3.Session are safe to share across concurrent fetch_cost_data calls on the same instance, which is the point — one governor enforces the global concurrency ceiling across every query it runs. Create a fresh instance only when you need a different region or a separate concurrency budget.

Handling rate limits when pulling database metrics — the sibling pattern for header-driven backoff and preemptive token-bucket throttling.
Validating JSON billing payloads with Pydantic — deeper on the strict model contract this parser validates against.
Graceful degradation when billing APIs are down — extending the backoff path into cached fallback state.
Async Usage Parsing Workflows — the parent topic covering concurrent, validation-first usage ingestion.

Back to: Async Usage Parsing Workflows

Building async Python parsers for AWS Cost Explorer #

Prerequisites #

Step-by-Step Implementation #

Step 1 — Define the validated record contract #

Step 2 — Gate concurrency and back off on throttling #

Step 3 — Fetch and validate into typed records #

Step 4 — Fan out across regions with structured concurrency #

Verification #

Gotchas & Edge Cases #

Frequently Asked Questions #

Why use aioboto3 instead of running boto3 in a thread pool? #

What concurrency value should the semaphore use? #

How do I add cost allocation tags to the grouping? #

Why is a group being silently dropped from my results? #

Can I reuse one parser instance across many queries? #

Related #