Skip to content

Task 02f: Control group, proxy refinement, Epstein deep-dive, Barak LLM Tier 3 PoC#7

Merged
MaksymDS merged 1 commit into
masterfrom
task02f/control-group-and-proxy-refinement
Apr 27, 2026
Merged

Task 02f: Control group, proxy refinement, Epstein deep-dive, Barak LLM Tier 3 PoC#7
MaksymDS merged 1 commit into
masterfrom
task02f/control-group-and-proxy-refinement

Conversation

@MaksymDS
Copy link
Copy Markdown
Contributor

Summary

Task 02f tested whether the pilot's 20.3% positive ILS rate represents genuine informed-flow signal or proxy artefact. Findings are negative but methodologically valuable.

Phases

  • Phase 1 (control group): 725 event_resolved vs 683 unclassifiable, Mann-Whitney p=10⁻⁶, separation REVERSED (control 21.4% vs pilot 15.2% positive). Reason: resolved_at−24h proxy is structurally valid for sports/behavioural markets but misaligned with event_resolved political/regulatory markets.

  • Phase 2 (proxy sensitivity): Tighter offsets (24h→6h→2h→1h) collapse positive rate to 0%. Spearman ρ(24h, 1h) = 0.542. ILS not robust to anchor choice.

  • Phase 3 (Epstein deep-dive): AOC ILS=0.93 and Sanders ILS=0.64 are formula edge effects (high p_open). Barak ILS=0.55 is genuinely interesting but anchor-unstable. Dominant wallet 0x4bfb41d5 is veteran professional (5,115 markets), not insider.

  • Phase 4 (Barak LLM Tier 3 PoC): Recovered T_news = Dec 18 2025 (House Oversight Committee Epstein photo release, multi-source verified). ILS shifts only 0.553 → 0.570 with correct anchor. 8 of 15 "early" wallets actually entered post-real-news. Dec 20 crash explained as resolution-criteria arbitrage, not insider selling.

Methodological contributions for paper v0.8

  1. Resolution typology (event-resolved vs deadline-resolved vs unclassifiable)
  2. Edge-effect scope condition: |p_open - 0.5| ≤ 0.4 required for interpretable ILS
  3. Anchor-sensitivity scope condition: multi-window robustness check required
  4. Proxy quality not the binding constraint (validated by Barak PoC)
  5. Resolution-criteria arbitrage as distinct phenomenon

Files

reports/TASK_02F_CONTROL_COMPARISON.md
reports/TASK_02F_PROXY_REFINEMENT.md
reports/TASK_02F_EPSTEIN_CASE_STUDY.md
reports/TASK_02F_BARAK_LLM_TIER3.md
reports/TASK_02F_FINAL.md

scripts/build_control_group.py
scripts/proxy_refinement.py
scripts/epstein_phase3_query.py
scripts/tier3_barak.py
config/validation_markets.yaml

After merge

Branch closes. Paper v0.8 published. Next: Task 02g (batch LLM Tier 3 on FFIC validation set + matched control), enabled by Anthropic API key now configured in .env.

🤖 Generated with Claude Code

Epstein files release confirmed: House Oversight Dems released 68 estate
photos (including Barak) on 2025-12-18T18:00Z — 4.8 days before the
resolved_at-24h proxy anchor (Dec 22).

ILS comparison:
  proxy (Dec 22):  ILS = 0.553, p_news = 0.629
  LLM   (Dec 18):  ILS = 0.570, p_news = 0.643  (ΔILS = +0.017)

Wallet reclassification (15 top wallets):
  PRE_BOTH  (genuinely pre-news): 6 wallets — dominated by veteran 0x4bfb41d5
  PRE_PROXY_ONLY (reactive post-news): 8 wallets — entered Dec 19-22 after crash
  POST_BOTH: 1 wallet

Dec 20 crash (21.6%) now explained: resolution criteria uncertainty after
photo release (does a photo count?), not insider selling.

Conclusion: ILS = 0.570 reflects market informativeness, not insider trading.

Also extends fflow news tier3 with --validation-set and --max-cost flags,
and improves llm_match.py prompt + extra_context support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MaksymDS MaksymDS merged commit 7af9cf1 into master Apr 27, 2026
1 check failed
@MaksymDS MaksymDS deleted the task02f/control-group-and-proxy-refinement branch May 1, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant