Skip to content

fix(subgraph): batch idempotency, jsonl checkpoint, resume support#2

Merged
MaksymDS merged 3 commits into
masterfrom
fix/batch-idempotency-checkpoint
Apr 26, 2026
Merged

fix(subgraph): batch idempotency, jsonl checkpoint, resume support#2
MaksymDS merged 3 commits into
masterfrom
fix/batch-idempotency-checkpoint

Conversation

@MaksymDS
Copy link
Copy Markdown
Contributor

Summary

  • Idempotency: _subgraph_batch now checks trades table before fetching — markets resolved >24h ago with existing trades are skipped with subgraph_skip_already_collected log entry. Safe to restart a failed batch without re-downloading everything.
  • jsonl checkpoint: after every market, appends one line to logs/batch_progress.jsonl with market_id, status (ok/skipped/failed), trades_count, wallets_count, duration_ms, ts. Append-only, never rewrites.
  • Resume: on batch start, reads existing checkpoint and skips all market_ids with status=ok. Restart picks up exactly where it left off.
  • n_wallets in CollectorResult: _upsert_trades now returns (trades, wallets) tuple; CollectorResult.n_wallets carries it to the checkpoint.
  • config: extra="ignore" on Settings to tolerate unknown env vars (e.g. FFLOW_THEGRAPH_API_KEY_SECONDARY).

Test plan

  • uv run pytest tests/test_batch_idempotency.py — 11/11 passed
  • uv run pytest — 100 passed, 2 skipped

Depends on: #1 (retroactive/task02c-history)

🤖 Generated with Claude Code

- Add n_wallets field to CollectorResult to carry wallet count from upsert
- _upsert_trades now returns (trades, wallets) tuple
- _subgraph_batch checks trades table before fetching; skips markets with
  existing trades that resolved >24h ago (log: subgraph_skip_already_collected)
- _write_progress(): append-only jsonl, one line per market with
  market_id, status, trades_count, wallets_count, duration_ms, ts
- _load_resume_set(): reads checkpoint, returns set of ok market_ids
- _subgraph_batch: loads resume set on start, skips already-ok markets,
  writes checkpoint entry after every market (ok/skipped/failed)
- config: add extra='ignore' to Settings to tolerate unknown env vars
- TestJsonlCheckpoint: 4 tests for _write_progress (schema, append, one-line-per-call)
- TestLoadResumeSet: 4 tests for _load_resume_set (empty file, ok-only filter, malformed lines)
- test_batch_skips_already_collected_markets: collector.run never called for stale+existing market
- test_batch_writes_jsonl_checkpoint: checkpoint entry has correct fields for successful market
- test_batch_resumes_from_checkpoint: done markets skipped, new markets processed
@MaksymDS MaksymDS merged commit dcc496a into master Apr 26, 2026
1 check failed
@MaksymDS MaksymDS deleted the fix/batch-idempotency-checkpoint branch May 1, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant