LLM CO₂ benchmark (indicative)
Orders of magnitude for product and sustainability teams — same coefficients power the live API, with full citations on the methodology page.
How to read this table
Values are grams CO₂e per 1,000 total tokens (prompt + completion), using consolidated model-level factors. Real deployments vary by region, hardware, and load — use this for comparison and planning, then instrument production with the API for auditable totals.
Peer-reviewed and vendor sources underpin the measured and benchmarked rows; remaining models use transparent estimated class factors. See Methodology for URLs and formulas.
Coefficients by model
Snapshot of the coefficient set used by
/estimate and /track.| Model | gCO₂e / 1k tokens | Confidence |
|---|---|---|
| gpt-4o | 0.3 | Benchmarked |
| gpt-4o-mini | 0.1 | Benchmarked |
| gpt-4-turbo | 0.35 | Estimated |
| gpt-3.5-turbo | 0.08 | Estimated |
| claude-3-5-sonnet | 0.3 | Benchmarked |
| claude-3-opus | 0.45 | Benchmarked |
| claude-3-haiku | 0.1 | Benchmarked |
| mistral-large-2 | 2.85 | Measured |
| mistral-small | 0.8 | Estimated |
| mistral-medium | 1.2 | Estimated |
| gemini-1-5-flash | 0.075 | Measured |
| gemini-1-5-pro | 0.12 | Measured |
| gemini-2-0-flash | 0.08 | Measured |
| llama-3-70b | 0.25 | Benchmarked |
| llama-3-8b | 0.05 | Benchmarked |
Need the full source strings and PDF narrative? See the complete table on Methodology.