measure(H_1041): 🔴 IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC — MPC reclaims deep-horizon lead on harder (nonlinear + partial-obs) tasks#1959
Merged
Conversation
…rtial-obs Kalman) + reproduce-H_1034 gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…deep-horizon lead on nonlinear + partial-obs harder tasks
H_1034 imagine>MPC advantage does NOT generalize: on a nonlinear pendulum swing-up
(MPC plans exact nonlinear dynamics, linear-LDS WM cannot) and a partial-obs+noise
station-keeping with a Kalman-belief MPC, the best MPC catches/beats imagine at deep
horizon {8,16} (Task A lead -0.83/-0.89, Task B -29/-23; all Welch p<1e-3). The
mechanism (robust planning vs brittle noise-free CEM landscape) transfers; the headline
does not. reproduce-H_1034 = bit-identical PASS before scoring. $0 CPU-local, 0 pods,
g5 CODE-measured, a_phi_iit4_tool n/a. Toy single rung per task, scale-transfer UNVERIFIED.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
H_1041 — does imagine-rollout still beat MPC on a HARDER control task? (H_1034 generalization)
🔴 RED — IMAGINE-ADVANTAGE-IS-TASK-SPECIFIC (closed-negative, a_paper_negative_ok). $0 CPU-local, 0 pods, g5 CODE-measured, a_phi_iit4_tool n/a.
Pre-registered falsifier (UNIVERSE/H_1041_imagine_harder_control.md, merged #1939) filled verbatim: three planners (naive-MPC, robust scenario/SAA tube-MPC, imagine-rollout through anima's learned LDS WM), frozen depth ladder {1,2,4,8,16}, N_RUNS=40×40ep, GAP_TOL=0.05, Welch p<1e-3. PASS = imagine still leads the best MPC by >GAP_TOL at deep horizon {8,16} on ≥1 harder task.
reproduce-H_1034 gate = PASS (bit-identical)
Re-ran the H_1034 planners; naive/robust/imag curves matched the stored
.verdicts/1034verdict EXACTLY before scoring (g73 honored).Result = FAIL on BOTH harder tasks (advantage does NOT generalize)
Task A is the clean FAIL: both MPCs plan the EXACT nonlinear true dynamics while imagine's WM is a LINEAR LDS that can't capture sin(θ). Task B FAILs harder (linear WM diverges under heavy obs-noise; MPCs get the optimal Kalman belief).
Read
The H_1034 'imagine beats MPC at deep horizon' was SPECIFIC to stiff-LINEAR CEM-landscape difficulty. When the true dynamics are nonlinear or the optimal belief (Kalman) is available, a true-model MPC reclaims the deep-horizon lead. The MECHANISM (robust/expected-return planning beats a brittle noise-free landscape) transfers; the HEADLINE (imagine > MPC) does NOT generalize beyond the stiff-linear toy.
TOY single rung per task; scenario-tube + Kalman-belief variants only; scale-transfer UNVERIFIED (a_scale_honest_scope · a_toy_scale_recheck).
Artifacts:
UNIVERSE/h1041_imagine_harder_control.py·.verdicts/1041_imagine_harder_control/H_1041.txt· tieredUNIVERSE/H_1041_imagine_harder_control.md.🤖 Generated with Claude Code