/
Blog
·8 min read

Where to read prompt and completion tokens (OpenAI-style APIs) for carbon accounting

A field-by-field guide to usage metadata: what to log after each LLM call, how it maps to /track, and why tokenizer estimates are a last resort in production.

Carbon accounting for LLM inference starts with activity data: how many tokens were processed, for which model, in which period. Most hosted APIs return that in a usage object (or equivalent) on the response — not in the request. Your job is to persist those numbers next to your business keys (e.g. tenant_id) and forward them to your carbon pipeline.

What to capture after each call

  • Model identifier — stable string that maps to your coefficient table (e.g. gpt-4o, provider-specific names included).
  • prompt_tokens and completion_tokens (or a documented total if your provider only exposes one aggregate).
  • Timestamp for monthly reporting and trend charts.
  • Environment — test vs live keys, if your reporting boundary treats them differently.

Why provider counts beat local estimates

Client-side tokenizers and character heuristics are useful for demos; for disclosures and customer-facing dashboards, prefer counts from the API response. They align with what was actually billed and reduce reconciliation gaps when an auditor asks how you derived activity data.

Mapping to a carbon API

Once you have model + token counts, you can call an estimate endpoint for previews or append-only /track for production history. carbon-llm only needs metadata — no prompt body — which keeps privacy and security reviews simpler.

Tip. If your stack uses streaming, ensure you still read the final usage block (some SDKs expose it on stream completion). Dropping completion tokens systematically under-states impact.

Disclaimer. Field names differ by provider; this post is a pattern, not an exhaustive vendor list. Align with your API version and logging strategy.