/
Blog
·11 min read

How to calculate an LLM carbon footprint from tokens (coefficient-based)

A practical, audit-friendly method: convert prompt+completion tokens into CO₂e using documented coefficients, then keep traceability for ESRS E1 / Scope 3 discussions.

If you want an LLM carbon footprint that can survive review (and later assurance), you need more than a “grams per answer” headline. The most defensible path is coefficient-based: convert activity data (prompt + completion tokens) into CO₂e using a documented emission factor, then keep traceability around assumptions, versions, and boundaries.

1) Collect tokens as activity data

Your starting point is the usage metadata returned by your LLM provider (or gateway). For each relevant inference event, capture:

  • tokens_total = prompt_tokens + completion_tokens
  • model identity (stable key so it maps to a coefficient table)
  • time window (e.g. the month you report)
  • environment (test vs production, if your reporting boundary differs)

2) Use coefficients with clear provenance

Coefficients are where practitioners often get stuck: different providers publish different documents, and literature can vary by assumptions. The practical solution is to maintain a coefficient table that has, for each model:

  • value (e.g. grams CO₂e per 1,000 tokens)
  • confidence label (measured / benchmarked / estimated)
  • source link or reference
  • revision history (what changed, when, and why)

3) Compute CO₂e from tokens

The coefficient-based computation is intentionally simple:

CO₂e (g) = (tokens_total / 1,000) × model_coefficient (gCO₂e per 1,000 tokens)

In practice, you aggregate per model and then sum across models for the period you report. This keeps the calculation fast, repeatable, and easy to audit.

4) Add context: limitations and uncertainty

A strong SEO article doesn’t end with the formula — a strong audit-ready one does. Your methodology note should explicitly address what you do not model line-by-line:

  • datacenter-level variability (grid mix, location, and workload mix)
  • changes in model versions and operational parameters over time
  • network effects / PUE granularity when you only have aggregated factors

5) Tie it to the reporting frame (Scope 3 style)

For many organizations consuming third-party LLM services, this computation supports Scope 3, category 1 style discussions: purchased goods and services. However, the “right category” is a boundary decision. When you embed AI into the goods/services you sell, some organizations also discuss category 11 considerations (use of sold products). See: Scope 3 category 1 vs 11 for AI / LLM services.

6) Operational checklist (what to document)

If you want something you can paste into your ESRS E1 / internal audit workflow, document:

  1. What time window was included (and exclusions)
  2. Which model keys were mapped to which coefficient rows
  3. Which coefficient revision was used for each report period
  4. How test vs production was treated (if applicable)
  5. How you handle missing usage fields or provider changes

Disclaimer. This post is educational and not legal advice. CSRD/ESRS requirements and assurance expectations evolve. Use qualified advisors and align disclosures with your company’s double materiality assessment and auditor guidance.