Skip to content

ceac: skip peer-cache rows outside the verdict lookback window#540

Open
catoneone wants to merge 1 commit into
mainfrom
ceac/anticopy-verdict-lookback-peer-cache
Open

ceac: skip peer-cache rows outside the verdict lookback window#540
catoneone wants to merge 1 commit into
mainfrom
ceac/anticopy-verdict-lookback-peer-cache

Conversation

@catoneone

Copy link
Copy Markdown
Collaborator

Why

`scores_index` has no TTL — it accumulates rows for every miner
(hotkey, revision) that ever ran through anti-copy, including
deregistered ones. Observed on the validator at the time of writing:

  • `tick: 10/652 rows pending` → 652 R2 blob fetches
  • ~4 MB / blob → 2.6 GB of R2 I/O on every refresh-service
    restart
  • ~70 % of those rows are older than `verdict_lookback_days` and
    `_pick_origin` already filters them out during the verdict math
  • This was making the post-restart "build peer cache" step take
    10-15+ minutes on a large table, blocking the first batch of
    verdicts behind it.

What

Compute a single floor up front:

```
peer_fb_floor = min(pending.first_block) - lookback_blocks
```

`min()` (not `max()`) anchors off the earliest pending candidate, so
the resulting floor is valid for every pending row this tick will
judge. Rows whose `first_block` is below `peer_fb_floor` are skipped
before the R2 fetch. The log line now reports both the cache size and
the skip count so trim depth is visible per tick:

```
[anticopy.verdict] peer cache built: 192 blobs (skipped 461 outside 7d lookback)
```

`verdict_lookback_days=0` keeps the original "load everything"
behaviour intact (validated by a new test alongside the affirmative
case).

Test plan

  • `pytest tests/test_anticopy_verdict_backfill.py` — 8 passed
    (5 existing + 2 new: skip-outside-window + lookback-zero-disables-filter).
  • Validator deploy: confirm `peer cache built: X blobs
    (skipped Y outside Nd lookback)` log appears with X+Y == `list_all`
    size, and that the tick completes proportionally faster than the
    pre-PR 2.6 GB pull.

Notes

Doesn't touch the actual verdict math or row-level decision criteria —
strictly an I/O / memory optimisation for the cache-build step. The
verdict outcomes are identical (the rows we now skip were already
being filtered out by `_pick_origin`'s own lookback check).

scores_index has no TTL, so the table grows unbounded as miners
register/dereg/upload new revisions — observed: 653 rows / ~2.6 GB
of R2 pulls on every refresh-service restart even though >70% are
older than verdict_lookback_days and can never influence a
verdict. _pick_origin already filters peers older than the same
window during the verdict math, so the R2 work was pure overhead.

Compute peer_fb_floor = min(pending.first_block) - lookback_blocks
once per tick and skip rows whose first_block is below it before
issuing the R2 fetch. Earliest-pending anchor keeps the floor valid
for every pending candidate this tick will judge. lookback_days=0
disables the filter (matches the existing 'all rows always loaded'
behaviour expected by the older fixtures).

Logged at INFO: peer cache size + how many rows were skipped, so an
operator can sanity-check the trim depth across a deploy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant