Skip to content

Relay history problem: 77% availability vs 24% recall at 1yr #21

Description

@alltheseas

Problem

Algorithm recall drops from 83% at 7 days to 24% at 1 year. Relay retention is not the main cause — median relay event availability at 1yr is 77% (measured across 277 relays, 42 cache files, 4 profiles via bench/retention-profile.ts).

The dominant factor is relay discovery: a user's current kind-10002 (NIP-65) lists where they write now, not where they wrote a year ago. When users change relays, the link to their old content is lost.

Evidence

  • Relay-side drop (7d→1yr): ~20% (e.g. relay.damus.io 94%→76%, nos.lol 95%→77%)
  • Algorithm-side drop (7d→1yr): ~70% (83%→24%)
  • Relay retention explains ~1/4 of the recall collapse
  • Historical kind-10002 events are gone — probed 6 relays, all correctly discard old versions per NIP-01 replaceable event semantics

Why existing mechanisms don't help

  • Kind-10002 is replaceable: old relay lists are discarded. No relay archives them.
  • NIP-66 (current): monitors relay liveness and capabilities, but not retention depth.
  • Hardcoded archival relay lists: centralizing and fragile.

Proposed solutions

Two independent, complementary protocol changes:

1. NIP-65 archive marker (author-level relay history)

PR: nostr-protocol/nips#2243

When a user migrates from relay A to relay B, their client marks relay A as archive instead of removing it. Clients fetching historical content check archive-tagged relays.

["r", "wss://old-relay.example.com", "archive"]
  • No new event kinds, no NIP-01 changes
  • Backward compatible — clients that don't understand archive ignore unknown markers
  • Opt-in — clients SHOULD NOT add archive tags without user consent

2. NIP-66 retention-depth tag (relay-level retention discovery)

Monitors probe relays at increasing time depths and publish results:

["retention-depth", "kind:1", "31536000"]

Meaning: "this relay returned kind-1 events at least 1yr old."

  • No baseline computation needed (unlike recall-at-window)
  • Any NIP-66 monitor can add this today
  • bench/retention-profile.ts is the offline proof-of-concept

How they complement each other

Scenario archive retention-depth
Author migrated, old relay has events Finds old relay directly Doesn't know author was there
Author stayed, current relay pruned No archive tags to follow Flags relay as low-retention
Client doesn't support archive No help Still guides relay selection

Either alone improves recall. Together they close the gap from both sides. Both are independent and can ship separately.

Data

Full retention profile: deno task retention (zero network, reads from phase2 cache)

Top relays by 1yr availability (≥20 authors):

  • relay.ditto.pub: 98% (61 authors)
  • nostr-pub.wellorder.net: 93% (289 authors)
  • nostr21.com: 86% (56 authors)
  • nos.lol: 77% (1295 authors)
  • relay.damus.io: 76% (1451 authors)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions