Task 02f: Control group, proxy refinement, Epstein deep-dive, Barak LLM Tier 3 PoC#7
Merged
Conversation
Epstein files release confirmed: House Oversight Dems released 68 estate photos (including Barak) on 2025-12-18T18:00Z — 4.8 days before the resolved_at-24h proxy anchor (Dec 22). ILS comparison: proxy (Dec 22): ILS = 0.553, p_news = 0.629 LLM (Dec 18): ILS = 0.570, p_news = 0.643 (ΔILS = +0.017) Wallet reclassification (15 top wallets): PRE_BOTH (genuinely pre-news): 6 wallets — dominated by veteran 0x4bfb41d5 PRE_PROXY_ONLY (reactive post-news): 8 wallets — entered Dec 19-22 after crash POST_BOTH: 1 wallet Dec 20 crash (21.6%) now explained: resolution criteria uncertainty after photo release (does a photo count?), not insider selling. Conclusion: ILS = 0.570 reflects market informativeness, not insider trading. Also extends fflow news tier3 with --validation-set and --max-cost flags, and improves llm_match.py prompt + extra_context support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Task 02f tested whether the pilot's 20.3% positive ILS rate represents genuine informed-flow signal or proxy artefact. Findings are negative but methodologically valuable.
Phases
Phase 1 (control group): 725 event_resolved vs 683 unclassifiable, Mann-Whitney p=10⁻⁶, separation REVERSED (control 21.4% vs pilot 15.2% positive). Reason: resolved_at−24h proxy is structurally valid for sports/behavioural markets but misaligned with event_resolved political/regulatory markets.
Phase 2 (proxy sensitivity): Tighter offsets (24h→6h→2h→1h) collapse positive rate to 0%. Spearman ρ(24h, 1h) = 0.542. ILS not robust to anchor choice.
Phase 3 (Epstein deep-dive): AOC ILS=0.93 and Sanders ILS=0.64 are formula edge effects (high p_open). Barak ILS=0.55 is genuinely interesting but anchor-unstable. Dominant wallet 0x4bfb41d5 is veteran professional (5,115 markets), not insider.
Phase 4 (Barak LLM Tier 3 PoC): Recovered T_news = Dec 18 2025 (House Oversight Committee Epstein photo release, multi-source verified). ILS shifts only 0.553 → 0.570 with correct anchor. 8 of 15 "early" wallets actually entered post-real-news. Dec 20 crash explained as resolution-criteria arbitrage, not insider selling.
Methodological contributions for paper v0.8
Files
After merge
Branch closes. Paper v0.8 published. Next: Task 02g (batch LLM Tier 3 on FFIC validation set + matched control), enabled by Anthropic API key now configured in .env.
🤖 Generated with Claude Code