CSRD & Scope 3 for LLMs — Step-by-Step Reporting Guide

How to classify, measure, and disclose the CO₂ footprint of LLM inference under ESRS E1 and GHG Protocol Scope 3. Practical steps for CSRD compliance teams.

CSRD and ESRS E1: what applies to LLM usage?

The Corporate Sustainability Reporting Directive (CSRD), effective from FY2024 for large EU companies, requires disclosure under ESRS (European Sustainability Reporting Standards). ESRS E1 covers climate change, including a full GHG inventory.

ESRS E1-6 mandates disclosure of Scope 1, 2, and material Scope 3 categories. For companies operating AI products or using LLM APIs, the emissions from inference are material and must be quantified.

The GHG Protocol's Scope 3 standard — referenced by ESRS — classifies purchased cloud API services as Category 1 (Purchased Goods and Services). If you sell AI-powered products, the energy consumed by your customers' usage is Category 11 (Use of Sold Products).

Activity data: what to measure

Unlike data centre energy (Scope 2), you do not have direct access to kWh consumed by a third-party LLM API. The measurement approach endorsed by emerging frameworks (e.g. GHG Protocol ICT sector guidance) is:

Activity data × Emission factor = CO₂e

For LLM inference: activity data = input tokens + output tokens per model call. Emission factors convert token counts to energy (Wh) using model-specific benchmarks, then to CO₂e using grid intensity (gCO₂/kWh) for the region where inference runs.

carbon-llm collects token counts automatically and applies these factors in real-time. You never need to handle raw energy data yourself.

Emission factors used by carbon-llm

carbon-llm uses a combination of published research and IEA regional grid intensity data:

Emission factor methodology (summary)

Energy per token:
  GPT-4o:         ~0.00030 Wh / 1k tokens (input), ~0.00120 Wh / 1k tokens (output)
  Claude 3.5:     ~0.00025 Wh / 1k tokens (input), ~0.00100 Wh / 1k tokens (output)
  Mistral Large:  ~0.00020 Wh / 1k tokens (input), ~0.00080 Wh / 1k tokens (output)

Grid intensity (gCO₂/kWh):
  US East (Virginia):    ~385 gCO₂/kWh
  EU West (Ireland):     ~290 gCO₂/kWh
  EU Central (Frankfurt): ~320 gCO₂/kWh

Source: IEA Electricity 2024, MLPerf Inference benchmarks, academic literature.
carbon-llm updates factors quarterly.

Step 1 — Integrate carbon-llm

Add one line after each LLM response. The SDK accepts model name and token counts — it never receives prompt text:

Any backend framework

import { CarbonLLM } from "@carbon-llm/sdk"

const carbon = new CarbonLLM({ apiKey: process.env.CARBON_LLM_API_KEY })

// After your LLM call:
await carbon.track({
  model: response.model,           // e.g. "gpt-4o"
  inputTokens: usage.prompt_tokens,
  outputTokens: usage.completion_tokens,
  tenantId: org.id,                // optional — for multi-client reports
})

Step 2 — Accumulate and review

The dashboard shows cumulative CO₂e by model, by day, and by tenant. Use the date range picker to pull figures for your reporting period (e.g. 1 Jan – 31 Dec).

Key metrics for ESRS E1-6:

• Total CO₂e (metric tonnes) — your Scope 3 inventory line item

• CO₂e by model — supports efficiency analysis and reduction target setting

• CO₂e by tenant — required for multi-client or product-line disclosures

Step 3 — Export ESRS E1 report

Go to Reports → Generate ESRS E1 report. Select:

• Reporting period (start/end date)

• Scope (all tenants, or specific tenants)

• Format (PDF for auditors, JSON for your GHG inventory tool)

The PDF includes: total CO₂e, methodology description, emission factor sources, and a per-model breakdown. This is designed to be attached to your sustainability report as technical evidence.

Step 4 — Set reduction targets (ESRS E1-1)

ESRS E1-1 requires a transition plan with GHG reduction targets. Practical levers for LLM emissions:

Model efficiency: replace GPT-4o with GPT-4o-mini or Claude Haiku for tasks that don't need frontier capability — typically 8–12× lower per-token energy.

Semantic caching: cache embeddings of frequent queries and return cached responses. A 30% cache hit rate reduces billable tokens by the same proportion.

Batching: process non-real-time tasks (report generation, bulk analysis) in off-peak hours when grid intensity is lower.

Prompt compression: shorter prompts use fewer input tokens. Libraries like LLMLingua achieve 4× compression with minimal accuracy loss.

Document your chosen measures in the transition plan and track progress quarterly using the dashboard's trend view.

Double materiality assessment

CSRD requires a double materiality assessment (DMA) before disclosure. For LLM emissions:

Financial materiality: assess whether climate transition risks (carbon taxes, energy price volatility, model provider pricing changes driven by energy costs) could affect your business.

Impact materiality: assess the actual environmental impact of your inference activity. Even if small in absolute terms, it may be material relative to your total Scope 3 footprint or your product's value proposition.

carbon-llm's PDF exports include the data needed to support both axes of your DMA.

Start measuring your LLM emissions today

Free tier · ESRS E1 PDF exports · No prompt logging

Create free account Book a demo