carbon-llm/Methodology
Back

CO₂ calculation methodology

How we compute the carbon footprint of your LLM calls — transparent and traceable.

Our methodology: from tokens to gCO₂e
We bridge LLM usage metadata and environmental impact using a three-tier validation lens — so RSE and engineering teams can trust the traceability story behind the numbers.

Sourced coefficients

We use the latest LCA (Life Cycle Assessment) data from Mistral LCA 2025 and Carbone 4 / ADEME-aligned references for European models, with clear "Measured" vs "Estimated" labelling.

Infrastructure context

Our methodology accounts for data-centre context where the literature supports it — including Power Usage Effectiveness (PUE) — using Gravity Climate 2025 research for OpenAI and Anthropic-class models when estimates are required.

Carbon intensity

We apply real-world or average grid carbon intensity factors (gCO₂e per kWh) aligned with the model's primary hosting regions when building or validating coefficients.

E = (Ttotal × Cmodel) × CIgrid

E = emissions (gCO₂e); T = tokens; C = energy cost per token; CIgrid = grid carbon intensity (gCO₂e/kWh).

In the API today (v6)

The live /track and /estimate endpoints apply one consolidated coefficient per model — grams CO₂e per 1,000 tokens — consistent with the sources above. PUE, regional grid intensity, and network effects are reflected in the literature behind those factors where applicable, rather than as separate runtime multipliers. This keeps responses fast, auditable, and easy to integrate for ISVs (multi-tenant via tenant_id).

Calculation formula
What the API computes from your token counts

CO₂ (gCO₂e) = (tokens_total / 1000) × model_coefficient

where tokens_total = prompt_tokens + completion_tokens

Both counts should match values from your LLM provider (the usage field or equivalent on the response). See the Tokens & usage section in the docs.

This matches activity data × emission factor accounting, aligned with the GHG Protocol for Scope 3 — category 1 (purchased goods and services). To improve accuracy, the GHG Protocol also recommends accounting for PUE, network carbon intensity, and inference location when data is available.

Reference standards

GHG Protocol

Global standard for greenhouse gas emissions accounting

ghgprotocol.org

ESRS E1

European Sustainability Reporting Standards - Climate change

efrag.org

Scope 3 — Category 1: Emissions from third-party LLM services fall under Scope 3 (value chain indirect emissions), category 1 (purchased goods and services).

CO₂ coefficients by model
ModelgCO₂e / 1k tokensConfidenceSource
gpt-4o0.3EstimatedGravity Climate — méthodo. IA (Grove), ~classe GPT-4
gpt-4o-mini0.1EstimatedGravity Climate 2025
gpt-4-turbo0.35EstimatedGravity Climate 2025
gpt-3.5-turbo0.08EstimatedGravity Climate 2025
claude-3-5-sonnet0.3EstimatedGravity — alignement classe GPT-4 (estimé)
claude-3-opus0.45EstimatedGravity Climate 2025
claude-3-haiku0.1EstimatedGravity Climate 2025
mistral-large-22.85MeasuredMistral LCA 2025 (1,14 gCO₂e / 400 tokens inférence ; Carbone 4 / ADEME)
mistral-small0.8EstimatedGravity Climate 2025
mistral-medium1.2EstimatedGravity Climate 2025
gemini-1-5-flash0.075MeasuredGoogle Cloud — impact inférence (médiane ~0,03 g/prompt, méthodo. 2025)
gemini-1-5-pro0.3EstimatedGravity — alignement classe GPT-4 (estimé)
gemini-2-0-flash0.08EstimatedGravity Climate 2025
llama-3-70b0.25EstimatedGravity Climate 2025
llama-3-8b0.05EstimatedGravity Climate 2025

Measured:Mistral Large 2 — published LCA (e.g. 1.14 gCO₂e for a typical 400-token inference response); Gemini 1.5 Flash — median published by Google Cloud for inference (blog "Measuring the environmental impact of AI inference", 2025).
Estimated:models without a public LCA — factors from Gravity Climate / Grove ("Developing an Emissions Accounting Methodology for AI"), GPT-4 class ~0.30 gCO₂e/1k tokens.

Data sources

Mistral AI — Large 2 LCA (2025)

First full peer-reviewed LCA for a large language model, with Carbone 4 and ADEME. Details: mistral.ai.

Google Cloud — Gemini inference

Methodology and orders of magnitude (energy, emissions, water) for Gemini Apps prompts: Measuring the environmental impact of AI inference.

Gravity Climate — AI methodology

"Developing an Emissions Accounting Methodology for AI" (including partnership with Grove). Useful when the provider has not published an LCA: gravityclimate.com.

GHG Protocol — Scope 3

Scope 3 calculation guidance (category 1, purchased goods and services): Corporate Value Chain (Scope 3) Standard · Scope 3 Calculation Guidance.

Limitations and uncertainty
  • Grid mix variability: Actual emissions depend on the datacenter grid mix at inference time, which can vary significantly.
  • Estimated data: For models without published data we use estimates that may diverge from reality.
  • PUE and grid: Aggregated coefficients may not reflect your region or site PUE; local analysis can refine the estimate.
  • LCA scope: Depending on the source, the LCA may exclude some items (e.g. end-user device, or isolate inference vs training/hardware); compare scopes before comparing two studies.
  • Model evolution: Coefficients can become outdated as providers optimize their models.
CSRD alignment

Our PDF reports include the information typically needed for CSRD-style disclosure:

  • Detailed calculation methodology
  • Coefficient sources with traceability
  • References to standards (GHG Protocol, ESRS E1)
  • Section on limitations and uncertainty
  • Scope 3 cat. 1 alignment statement
  • Equivalents for context

Under ESRS E1 and the GHG Protocol, activity factors (tokens × documented coefficient) are tier 2–3 estimation approaches when direct supplier data is unavailable — what matters is to document assumptions, sources, and ways to improve data quality.

Note: confirm the expected level of assurance with your auditor before formal CSRD reporting.

Questions about our methodology?