Skip to content

Task 02D: price reconstruction, UMA fix, ILS scoring, Phase 0 fixture#4

Merged
MaksymDS merged 28 commits into
masterfrom
task02d/price-reconstruction-and-uma
Apr 27, 2026
Merged

Task 02D: price reconstruction, UMA fix, ILS scoring, Phase 0 fixture#4
MaksymDS merged 28 commits into
masterfrom
task02d/price-reconstruction-and-uma

Conversation

@MaksymDS
Copy link
Copy Markdown
Contributor

Summary

  • Phase 1: Trade VWAP price reconstruction (reconstruct_price_series), price_source field added to market_labels
  • Phase 2: UMA collector rewrite — fixed subgraph auth (API key in URL path), added RPC eth_getLogs fallback with ABI decoding of Settle events
  • Phase 3: Two new CLI commands: fflow news tier1-batch (seeds T_news from UMA evidence URLs) and fflow news seed-proxy (seeds synthetic T_news as end_date - N days, tier=2)
  • Phase 4: ILS scored 4/24 FFICD validation markets (16 had no trade data, 4 had proxy edge cases); results in reports/TASK_02D_ILS_FFICD_RESULTS.md
  • Fixture: scripts/make_foresightflow_fixture.py rewritten — hard cutoff 2025-09-15 invariant, NegRisk + secondary bucket exclusion, per-category quota (6 categories), Brier calibration check; generates data/fixture_phase0.jsonl (50 markets, 52% YES)
  • Bug fixes: PriceLookupError handling in pipeline, t_open snapping to first trade, NUMERIC(20,6) for pre_news_max_jump, migration 0004

Phase 5 status

BLOCKED pending review of Phase 4 results. See reports/TASK_02D_ILS_FFICD_RESULTS.md — recommendation is to improve T_news quality before running control group comparison.

Test plan

  • uv run pytest — all existing tests pass
  • Review Phase 4 ILS results report
  • Approve or redirect Phase 5 plan

🤖 Generated with Claude Code

MaksymDS and others added 28 commits April 26, 2026 16:11
Runs SubgraphCollector + ClobCollector for all 24 markets across the
documented case set (fficd-001 through fficd-008), regardless of volume
threshold. Market IDs resolved by prefix LIKE lookup so real condition IDs
are used at runtime.

Features: idempotency skip (trades exist + resolved >24h), "bad indexers"
fast-fail, 4h runtime cap, append-only JSONL log, markdown status report.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, 0 errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, 0 errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, 3 indexer skips

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, rate 3.3 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, 3.5 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ades, 3.7 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ades, 3.7 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ades, 4.2 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es, 5.6 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, 7.8 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rades, 9.7 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rades, 11.1 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rades, 13.9 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rades, 15.7 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rades, 19.6 m/min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rades, 23 m/min, ETA ~1.8h

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…des, 26.4 m/min, ~31min left

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scripts/make_foresightflow_fixture.py: reusable JSONL fixture generator
  for coordination experiment. Phase 0 (50 markets) and Phase 1A (2000).
  Primary baselineMidPrice from CLOB prices table; --allow-trade-vwap flag
  enables VWAP fallback from trades table >24h before resolution.
  6-category mapping: keyword → fflow taxonomy → fallback.
- TASK_02C_RESULTS.md: Phase 3B marked COMPLETE (17.9M trades, 796K wallets,
  10,410 markets, 11 bad-indexer skips). Phase 3C marked READY TO RUN.
  Data collection table updated with final subgraph count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Etherscan deprecated the V1 Polygonscan endpoint (api.polygonscan.com/api).
V2 is at api.etherscan.io/v2/api with chainid=137 injected per-request.

- config.py: default polygonscan_url → https://api.etherscan.io/v2/api
- polygonscan.py: _get() prepends chainid=137 to every request
- .env: FFLOW_POLYGONSCAN_URL updated (was api.polygonscan.com)

BLOCKER: local DNS resolver returns NXDOMAIN for api.etherscan.io.
Workaround (requires user sudo):
  echo "23.92.68.154 api.etherscan.io" | sudo tee -a /etc/hosts
Or change system DNS to 8.8.8.8 in Network Preferences.
API confirmed working via resolved IP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
polygonscan:
- run() and _get_stale_wallets() gain min_trades param; queries trades
  table via JOIN to select only wallets with >= N trades (ordered by
  trade count DESC — most active first)
- Progress log every 100 wallets during batch (polygonscan_batch_progress)
- CLI: --min-trades flag wired through

With --min-trades 100: 11,393 wallets (~6.3h) vs 796K full set (440h+)

make_foresightflow_fixture.py:
- Progress log every 500 candidates scanned
- os.makedirs for output directory if needed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scripts/diagnose_clob_coverage.py: 7-step diagnostic covering:
1. Basic prices table stats (409 markets, 1.12M rows, Apr 13-26 window)
2. ILS-target coverage: 3/11,263 markets (0.0%) — confirms gap
3. FFICD validation set: 0/24 markets have CLOB prices
4-5. data_collection_runs analysis: 727 runs = 409 distinct markets,
     all from April 2026 open-market monitoring pilot
6. Trade VWAP feasibility: 100% of 17.9M trades have valid 0-1 price
7. Recommendation: trade VWAP unblocks ILS now; CLOB batch is Option A

reports/TASK_02C_CLOB_DIAGNOSTICS.md: generated output

Root cause of TASK_02C_RESULTS.md contradiction:
- 727 CLOB runs / 1.55M rows → open market monitoring pilot
- 0% ILS coverage → CLOB never ran for historical resolved markets
Both statements were correct about different market sets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…line

fflow/scoring/price_series.py (new):
  reconstruct_price_series(): CLOB first, trade VWAP fallback with
  1-min bucketing, forward-fill gaps. Returns DataFrame with source col.
  get_price_at(): CLOB→trade VWAP two-tier lookup, ±5min tolerance.

fflow/scoring/pipeline.py:
  compute_market_label() gains price_source='auto' param.
  'auto' = CLOB first, trade VWAP fallback; 'clob' = CLOB only;
  'trade_vwap' = force trades. Stores actual source in label row.

fflow/models.py:
  MarketLabel.price_source TEXT column added.

alembic/versions/0003_price_source.py:
  Migration 0002→0003, applied to DB.

tests/test_price_series.py: 9 tests, all pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fallback

- Key in URL path via _uma_subgraph_url() (not just Authorization header)
- On subgraph failure, fall back to eth_getLogs on UMA OOv2 at 1rpc.io/matic
- ABI-decode Settle event non-indexed data (bytes ancillaryData, int256 resolvedPrice)
- Default FFLOW_POLYGON_RPC_URL changed from polygon-rpc.com to 1rpc.io/matic
- Add RetryableHTTPClient.post() method

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- `fflow news tier1-batch [--limit N]`: bulk Tier 1 for all markets
  with resolution_evidence_url but no existing news_timestamps row
- `fflow news seed-proxy [--market-ids ...] [--category ...] [--offset-days N]`:
  seed synthetic T_news from end_date-N days (tier=2, confidence=0.50)
  for admin-resolved markets without UMA evidence

Seeded 24 FFICD validation markets with tier=2 proxy (end_date-1d).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Catch PriceLookupError from compute_ils, return None gracefully
- Snap t_open to first available trade when price series starts late
  (common for long-running markets with sparse early trading)
- Guard t_news < t_open: return None with t_news_predates_t_open warning
- Fix MarketLabel.pre_news_max_jump: NUMERIC(8,6) → NUMERIC(20,6)
  (it's a USDC amount, not a price; can exceed 99.999999)
- Migration 0004 to alter column type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4/24 markets scored (all others: no trade data or proxy T_news failures).
Key finding: T_news proxy quality (end_date-1d) is the dominant error source;
high |ILS| values reflect price-convergence noise, not informed trading signal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scripts/make_foresightflow_fixture.py: full rewrite — hard cutoff
  2025-09-15 invariant, NegRisk + secondary bucket exclusion, per-category
  quota sampling (crypto=8, politics=8, sports=8, economics=8,
  geopolitics=9, entertainment=9), Brier calibration check
- data/fixture_phase0.jsonl: 50 markets, 26 YES / 24 NO, 0 pre-cutoff,
  0 bucket markets; all baselines from trade_vwap (CLOB not backfilled)
- CHARTER_v0.3.md: updated project charter
- .gitignore: exclude .claude/ and memory/ directories

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MaksymDS MaksymDS merged commit 7d54759 into master Apr 27, 2026
1 check failed
@MaksymDS MaksymDS deleted the task02d/price-reconstruction-and-uma branch May 1, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant