Skip to content

measure(H_1041): 🔴 IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC — MPC reclaims deep-horizon lead on harder (nonlinear + partial-obs) tasks#1959

Merged
dancinlife merged 2 commits into
mainfrom
worktree-agent-a0e08c829dc1e0dad
Jun 8, 2026
Merged

measure(H_1041): 🔴 IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC — MPC reclaims deep-horizon lead on harder (nonlinear + partial-obs) tasks#1959
dancinlife merged 2 commits into
mainfrom
worktree-agent-a0e08c829dc1e0dad

Conversation

@dancinlife

Copy link
Copy Markdown
Contributor

H_1041 — does imagine-rollout still beat MPC on a HARDER control task? (H_1034 generalization)

🔴 RED — IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC (closed-negative, a_paper_negative_ok). $0 CPU-local, 0 pods, g5 CODE-measured, a_phi_iit4_tool n/a.

Pre-registered falsifier (UNIVERSE/H_1041_imagine_harder_control.md, merged #1939) filled verbatim: three planners (naive-MPC, robust scenario/SAA tube-MPC, imagine-rollout through anima's learned LDS WM), frozen depth ladder {1,2,4,8,16}, N_RUNS=40×40ep, GAP_TOL=0.05, Welch p<1e-3. PASS = imagine still leads the best MPC by >GAP_TOL at deep horizon {8,16} on ≥1 harder task.

reproduce-H_1034 gate = PASS (bit-identical)

Re-ran the H_1034 planners; naive/robust/imag curves matched the stored .verdicts/1034 verdict EXACTLY before scoring (g73 honored).

Result = FAIL on BOTH harder tasks (advantage does NOT generalize)

task d=8 lead (imag−bestMPC) d=16 lead result
A — nonlinear pendulum swing-up (angle-only obs, ω hidden) −0.8336 (p=1.7e-61) −0.8883 (p=3.9e-58) MPC beats imagine
B — partial-obs + obs-noise station-keeping (Kalman-belief MPC) −29.29 (p=9.1e-76) −22.93 (p=1.4e-69) MPC beats imagine

Task A is the clean FAIL: both MPCs plan the EXACT nonlinear true dynamics while imagine's WM is a LINEAR LDS that can't capture sin(θ). Task B FAILs harder (linear WM diverges under heavy obs-noise; MPCs get the optimal Kalman belief).

Read

The H_1034 'imagine beats MPC at deep horizon' was SPECIFIC to stiff-LINEAR CEM-landscape difficulty. When the true dynamics are nonlinear or the optimal belief (Kalman) is available, a true-model MPC reclaims the deep-horizon lead. The MECHANISM (robust/expected-return planning beats a brittle noise-free landscape) transfers; the HEADLINE (imagine > MPC) does NOT generalize beyond the stiff-linear toy.

TOY single rung per task; scenario-tube + Kalman-belief variants only; scale-transfer UNVERIFIED (a_scale_honest_scope · a_toy_scale_recheck).

Artifacts: UNIVERSE/h1041_imagine_harder_control.py · .verdicts/1041_imagine_harder_control/H_1041.txt · tiered UNIVERSE/H_1041_imagine_harder_control.md.

🤖 Generated with Claude Code

dancinlife and others added 2 commits June 9, 2026 07:36
…rtial-obs Kalman) + reproduce-H_1034 gate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…deep-horizon lead on nonlinear + partial-obs harder tasks

H_1034 imagine>MPC advantage does NOT generalize: on a nonlinear pendulum swing-up
(MPC plans exact nonlinear dynamics, linear-LDS WM cannot) and a partial-obs+noise
station-keeping with a Kalman-belief MPC, the best MPC catches/beats imagine at deep
horizon {8,16} (Task A lead -0.83/-0.89, Task B -29/-23; all Welch p<1e-3). The
mechanism (robust planning vs brittle noise-free CEM landscape) transfers; the headline
does not. reproduce-H_1034 = bit-identical PASS before scoring. $0 CPU-local, 0 pods,
g5 CODE-measured, a_phi_iit4_tool n/a. Toy single rung per task, scale-transfer UNVERIFIED.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant