Skip to content

SPUR-Coalition/telemetry

Content Telemetry

Signal format for AI content usage reporting.

This is a preview specification. Field names, event types, and schema structure may change before 1.0.

Contents

Problem

AI agents retrieve a content owner's content, use it to generate responses, and sometimes cite it. Content owners currently see an initial retrieval event - HTTP requests hitting their servers or access logs from content repositories. Whether the content actually influenced the response, whether it was cited, whether a user saw the citation, whether they clicked through - is not reported back to content owners.

Platforms self-report usage metrics (if they report at all), and content owners have no way to verify the numbers or compare across platforms.

Telemetry events

Content Telemetry tracks content through five stages:

Retrieved    →  content fetched over HTTP (content owner can see this today)
  Grounded   →  content loaded into the agent's generation context
    Cited    →  content explicitly referenced in the response
      Displayed  →  user saw it - a reference, or the content embedded in the answer
        Engaged  →  user clicked, copied, shared, or directed the agent to act

The session ties these events together - one bounded interaction from query to outcome, identified by a session ID that every event carries, from retrieval through engagement.

The gaps between stages show how content was used:

  • Retrieval without grounding - your content was fetched but not used
  • Grounding without citation - your content influenced the answer but you got no credit
  • Citation without engagement - your content was cited but the user didn't click through

The grounding event captures the boundary "this content entered the agent's generation context." It is architecture-neutral and decoupled from retrieval: content cached by the agent for days still produces a grounding event in every session it influences.

Grounding and display record two different kinds of influence: grounding means the content influenced the agent, display means it reached the user. The two diverge as agent experiences move beyond the chat window - an agentic browser can render a page to the user that never entered a generation context, reported as a content_displayed event with display_type: embed and no grounding event.

Design principles

Post-hoc, not pre-declared. Events report what actually happened, not what the agent said it would do at request time. An agent cannot reliably declare how it will use content before reading it.

Observable boundaries, not agent internals. The five event types mark boundary crossings. What happens between them - the fan-out, relevance evaluation, re-ranking, reasoning chains - is internal to the agent and changes constantly. The spec does not model it.

Multiple observers, one event. A content retrieval can be reported by the content owner's CDN, the content owner's origin server, and the AI agent independently. The Content-Telemetry-ID header correlates these into a single corroborated event. Uncorroborated retrievals (no matching agent event) may indicate an agent that does not yet support the telemetry protocol.

Repo contents

This repository is the standard - the wire format. Publisher-facing accreditation and the SPUR conformance mark are defined separately in the SPUR Content Telemetry Profile, which references this specification by version. The standard defines the privacy mechanism (section 5.5); whether a profile makes any privacy level binding is the profile's choice. See GOVERNANCE.md.

Example

A user asks an AI agent about UK interest rates. The agent grounds its response in a cached FT article, cites it, and shows a link. The user reads the answer and leaves without clicking through.

{
  "schema_version": "0.1",
  "session_id": "660e8400-e29b-41d4-a716-446655440000",
  "agent_id": "copilot-v3",
  "started_at": "2026-03-28T09:00:00Z",
  "events": [
    {
      "type": "content_grounded",
      "timestamp": "2026-03-28T09:00:00Z",
      "content_url": "https://www.ft.com/content/abc123",
      "content_id": "ft:abc123",
      "data": {
        "scope": "session",
        "cached": true,
        "tokens_ingested": 3200,
        "content_last_modified": "2026-03-27T18:30:00Z"
      }
    },
    {
      "type": "turn_started",
      "timestamp": "2026-03-28T09:00:01Z",
      "turn_id": "1",
      "turn": {
        "privacy_level": "intent",
        "query_intent": "question",
        "topics": ["UK economy", "interest rates"]
      }
    },
    {
      "type": "content_cited",
      "timestamp": "2026-03-28T09:00:05Z",
      "turn_id": "1",
      "content_url": "https://www.ft.com/content/abc123",
      "content_id": "ft:abc123",
      "data": {
        "citation_type": "paraphrase",
        "position": "primary"
      }
    },
    {
      "type": "content_displayed",
      "timestamp": "2026-03-28T09:00:05Z",
      "turn_id": "1",
      "content_url": "https://www.ft.com/content/abc123",
      "content_id": "ft:abc123",
      "data": { "display_type": "link" }
    },
    {
      "type": "turn_completed",
      "timestamp": "2026-03-28T09:00:05Z",
      "turn_id": "1",
      "turn": {
        "privacy_level": "intent",
        "response_mode": "standard",
        "response_tokens": 280,
        "ad_rendered": true
      }
    }
  ]
}

The content owner can derive: FT article abc123 was in context for the response, cited as a paraphrase, link was displayed, user never clicked, ads were shown alongside.

Relationship to other protocols

Content Telemetry is focussed on reporting, while content access protocols (Really Simple Licensing, peek-then-pay, IAB CoMP, bilateral APIs) aim to govern how agents discover and license content. The license_ref field on events connects telemetry to whatever access protocol issued the licence, but the schemas are independent - telemetry works with any access protocol, or none.

Request for comment

This specification is open for public comment from 12 June to 10 July 2026.

Feedback is triaged on the issue tracker as it arrives and incorporated into the next revision after the window closes, and the wire format is held stable during the window.

Comment is most useful on:

  • The open questions below.
  • Whether the conformance and privacy levels (sections 5.5 and 5.7) are implementable as written by a team building an emitter or consumer.
  • How the five-stage event model fits real agent architectures (section 6.4).
  • Anything that would require an implementer to depend on a particular operator or service to participate. The standard should be implementable from the public schemas alone.
  • Any worked example that does not validate against its schema, or any mismatch between the prose and the schemas.

File an issue on this repository using the available templates: Spec feedback / open question for design questions and proposed changes, and Schema or example bug for concrete defects. Pull requests are welcome for specific schema or text fixes; for larger changes, open an issue first (see CONTRIBUTING.md). Feedback on accreditation, the Compliant tier, or the conformance mark belongs on the profile repository.

Some areas are out of scope for this round. The non-goals are in section 1.3 and the deferred manifest features in section 8.9; please read those before filing. Comments on whether a non-goal is the right call are welcome, provided they say which one and why.

Required fields, event types, and schema structure may all change before 1.0 (section 12). Nothing is settled except the items listed as out of scope.

Open questions in v0.1

This is a preview specification. The following areas are under active discussion and will be refined with implementer input:

Grounding boundary. The spec defines grounding as content entering the generation model's context (sections 4.3 and 6.4). For straightforward RAG pipelines this is clear. For pipelines with multiple processing stages - embedding, re-ranking, summarisation before context insertion - the boundary requires judgement. The spec draws the line at the generation context (not earlier retrieval stages), but edge cases remain. When a re-ranking or summarisation stage is itself a generative model, the multi-step rule in section 6.4 (content entering a sub-agent's generation context is grounded) can pull selection stages back inside the boundary. Input from platform engineering teams building real implementations will sharpen this definition.

Event volume at scale. A single deep-research query can produce 100+ retrieval events and dozens of grounding/citation events. The session document format already handles transport - one POST with all events after the session ends, not one request per event. Volume management beyond that (storage, processing, consumer-side aggregation) is an implementation concern, not a protocol gap. Sampling and aggregation are options for future versions but are not in v0.1; the standard sets no default for reporting granularity, leaving it to profiles and deployments.

Verification of grounding and citation. Grounding and citation events are reported by the agent, which is also the party that may owe compensation under a licence. In v0.1, manifest signing is informational: consumers may verify signatures but are not required to, and the specification defines no required proof binding an event to its emitter (sections 8.4 and 8.9). The events attribution depends on are therefore self-reported by the reporting party. Verifiable credentials and signed events are deferred (section 8.9). One corroboration mechanism works without signing: the Content-Telemetry-ID field correlates an agent-reported retrieval with an origin- or edge-reported one (section 7.2), but it covers retrieval only - grounding, citation, display, and engagement have no independent observer. Signing, even once required, would prove who reported an event, not that the event is true or that all qualifying events were reported. Input is wanted on what a verification layer should cover and where it belongs. Mechanisms that test truthfulness and completeness rather than origin, such as sampled audits or publisher-seeded canary content, are of particular interest.

Reporting granularity. The standard sets no default for reporting granularity, leaving it to profiles and deployments (see Event volume above). The SPUR profile requires event-level delivery and does not permit aggregation. The open question is whether the standard should say more about sampling and aggregation so that profiles do not each define it separately, and how event-level delivery scales for the highest-volume case. No mechanism is selected in v0.1.

Versioning

This repo tracks the specification version. SDK repos have their own release cadences and declare which spec version they support.

Current spec version: 0.1 (preview)

About

Content Telemetry: an open standard for reporting AI content usage

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages