V2 baselines on the 2026 Trustees Report; reverse-Roth and 93% options by MaxGhenis · Pull Request #130 · PolicyEngine/crfb-tob-impacts

MaxGhenis · 2026-06-10T06:38:58Z

Summary

Rebuilds the long-horizon baseline datasets with a new division of labor — demographics in weights, economics in values — calibrated to the 2026 Trustees Reports (released June 9, 2026), and adds the two reform options from the May 22 CRFB meeting.

V2 baseline construction (`src/v2_projection.py`, `src/v2_pipeline.py`)

Per projection year: materialize the latest enhanced CPS at the target year → light entropy reweight to the Trustees age distribution → rescale values to Trustees aggregates (α: taxable payroll, cap-aware; β: OASDI cost; best-effort γ: beneficiary other income toward the TOB target) → donor-clone late-year support from 2075 (deterministic jittered clones of real contributor households) → final light entropy calibration hitting age, benefits, payroll, and the TR2026 OASDI/HI taxation-of-benefits series exactly, with v1-style income guards from 2075. Build-time enforcement of the documented publication gates; stage-by-stage sentinel diagnostics; per-year metadata sidecars.

Fixes three structural issues in the v1 datasets: the population-growth double count in aggregate-growth uprating (β ≈ −1.9%/yr corrects it), the weight-tilt concentration of the TOB base (Mark Sarney's "something is wrong with the 2100 data"), and pre-OBBBA targets under post-OBBBA law.

TR2026 targets (`scripts/extract_tr2026_targets.py`)

Section-aware extraction of intermediate-assumption series: OASDI cost (IV.B1 cost rate × VI.G1 payroll, since VI.G2 truncates at reserve depletion), taxable payroll and GDP (VI.G1), OASDI TOB (IV.B2 % × payroll), HI TOB (CMS 2026 Medicare expanded tables, annual through 2100). TR2026 includes OBBBA in current law, so the TR2025 post-OBBBA bridge (OACT letter deltas + provisional HI scaling) is retired. Population targets are interim: TR2026 V.A3 group totals on the TR2024 single-year-age shape until SSA posts the single-year file. Dashboard denominators and trust-fund gap rates refreshed to TR2026.

New reform options

reverse_roth (May 22 CRFB meeting, Sarney/Colavito): tax 100% of Social Security benefits immediately and make employee OASDI payroll taxes deductible above the line; Medicare unchanged.
tax93: taxes 93% of benefits — the Goss-analysis share CRFB planned to interpolate between the 90% and 95% options, run exactly instead.

Both are registered for static and behavioral scoring and excluded from published results until full reform H5 cells exist under the production contract.

Status

51 unit tests pass (tests/test_v2_projection.py, tests/test_reforms.py, CI guards)
Dashboard builds and lints clean
16-year dataset build on TR2026 targets (in progress)
Local proof → paid sentinel → 224-cell static panel + 28 behavioral endpoint cells per docs/current/v2-launch-runbook.md
Results aggregation, dashboard/paper refresh

🤖 Generated with Claude Code

Per-year construction: materialize the latest enhanced CPS at the target year, lightly entropy-reweight to the Trustees age distribution, rescale values to Trustees aggregates (alpha for taxable payroll, beta for OASDI cost, best-effort gamma for taxation-of-benefits other income), append jittered donor clones of real contributor households from 2075, then run a final light entropy calibration to age, Social Security, payroll, and post-OBBBA TOB targets with v1-style income guards. Calibrating to the post-OBBBA TOB series removes the pre/post-OBBBA mismatch carried by the v1 datasets; value-level scaling removes the population-growth double count in aggregate uprating series. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Also: best-effort gamma (TOB is nearly inelastic to other income at far horizons once 85% inclusion saturates), donor-clone late-year support with deterministic jitter, v1-style self-referential income guards from 2075, and per-year contributor gates matching the documented runtime defaults. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Reverse Roth (per the May 22 CRFB meeting with Mark Sarney and Anthony Colavito): immediately tax 100% of Social Security benefits and make employee OASDI payroll taxes deductible above the line, leaving Medicare on its current Roth-style basis. Ports the implementation drafted on the add-reverse-roth-proposal worktree. tax93: taxes 93% of benefits, the share Steve Goss's SSA analysis attributed to employer contributions and earnings — an exact run of the number CRFB planned to interpolate between the 90% and 95% options. Both are runnable reform IDs and excluded from published results until full reform H5 cells exist under the production contract. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TR2026 (released June 9, 2026) incorporates OBBBA in current law, so its taxation-of-benefits series replaces the TR2025 post-OBBBA bridge entirely: OASDI TOB from IV.B2 percent of taxable payroll times VI.G1 payroll, HI TOB from the CMS 2026 Medicare Trustees expanded tables (annual through 2100, no carry-forward). OASDI cost comes from the IV.B1 cost rate because the VI.G2 dollar table truncates at reserve depletion. All series use intermediate assumptions; the extractor is section-aware so alternatives cannot leak in. Population targets are interim: TR2026 V.A3 group totals (under 20 / 20-64 / 65+, capturing the fertility and immigration revisions) applied to the TR2024 single-year-age shape until SSA posts the TR2026 single-year file. Also re-solve the earnings and benefit scales after donor-clone augmentation so clone mass cannot push taxable payroll off target. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Taxable payroll and GDP from VI.G1, OASDI gap rates from IV.B1, HI gap rates from the CMS expanded tables. HI taxable payroll scales the TR2026 OASDI payroll by the TR2025 HI/OASDI ratio path because the Medicare expanded tables do not publish HI payroll levels. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-10T06:42:49Z

Vercel site preview: https://crfb-tob-impacts-9e1rab65u-policy-engine.vercel.app

…inel The new baseline-diagnostics battery caught gamma 1.62 pushing 2026 AGI to 155% of GDP while chasing the higher TR2026 HI taxation-of-benefits target. Gamma is now a bounded calibration nudge; broader donor support (3,000 donors x 6 jittered clones) carries the late-year residual, and every build logs AGI and income tax as shares of GDP, warning when AGI exceeds GDP. The dashboard baseline tab gains a 'Baseline trajectories through 2100' section: twelve small-multiple panels showing every calibrated series against its TR2026 target plus the uncalibrated by-products (income tax, AGI, beneficiary and worker counts vs TR2026 IV.B4, statutory payroll-tax checks, income components), each badged calibrated or diagnostic, with a spotlight table and download link. Data comes from scripts/build_v2_baseline_diagnostics.py, which simulates every built year dataset and attaches targets and references. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The 2026 diagnostic decomposition showed miscellaneous and partnership income roughly doubling through the weight tilt alone: without guards, the entropy solve closes taxation-of-benefits residuals by upweighting households heavy in the most concentrated income types. The guards pin ordinary non-payroll and preferential investment income totals at their post-value-scaling levels in every year, not just from 2075. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The published enhanced CPS stores miscellaneous_income summing to $9.3T weighted (real-world total is near $100B), with dozens of person records pinned at exactly $795,294,848 — an imputation top-code artifact, present in both the current and v1-era Hugging Face revisions. Uprated across the 75-year horizon and amplified by donor cloning (the pinned records maximize provisional income and therefore dominate the TOB contributor pool), these records pushed late-year AGI to several times GDP. Values above $10M per person are unambiguously corrupt and are zeroed at materialization, with the repair logged and stamped into each year's metadata. The upstream fix is tracked in policyengine-us-data. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The SOI uprating extensions grow several income categories near 5% per year forever against TR2026 nominal GDP near 3.5%, so AGI outruns GDP at far horizons (132% by the 2090s even after the miscellaneous-income repair). CBO-vintage growth stands through 2034; from 2035 each category's cumulative growth is capped at the GDP path, per the project direction that non-targeted income follows the CBO forecast and similar growth thereafter to the extent it does not contradict the Trustees. Donor support widens to 4,000 donors x 8 clones for the late-year taxation-of-benefits contributor gates. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A formatting-mismatched patch added cap_longrun_income_growth and its metadata reference without the call, so every year crashed at metadata assembly (NameError) or kept uncapped growth. Verified against real parameters: qualified dividends at 2095 cap at x0.46, interest stays untouched because it grows slower than GDP. A regression test pins the call site. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Populace stores 'unknown' race codes that the strict set_input round-trip rejects even though the native loader tolerates them. Invalid enum values coerce to each variable's default, logged and stamped into metadata. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Populace ships social_security alongside its four components; keeping the stored aggregate would shadow the adds formula and break the benefit-scaling stage. The frame now keeps components and lets aggregates recompute, covering both the populace layout and the enhanced-CPS behavioral-response case. Populace is also pinned to a stable project path because the Hugging Face cache evicted the snapshot mid-build. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Populace ships pre-uprated columns for 2024-2035 per variable; only the base period belongs in the frame so the pipeline stays the single uprating authority and out-year aggregates cannot bypass the value stages (the leak guard caught social_security__2026 doing exactly that). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

One paid cell (2026/option1/static) on the populace baselines under run prefix v2pop_tr2026_20260611, per Max's chat authorizations of June 9-11. Includes the baseline-dataset manifest (16 years, populace base pinned to populace-us-2024-9f1260b-20260611), the expected schema manifest from the free local proof, and the 16-year baseline diagnostics battery (income tax 7.8-10.7% of GDP, AGI 57-65%, all gates passed). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The per-run spawn guard compares run-scoped counters against the spawned-call list, which still carried the 24 v1 behavioral-endpoint records; the first v2pop attempt consumed its nonce at that guard before any paid call was created. History is preserved under previous_run_call_history. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Sentinel verified end to end: durable R2 artifacts, pre-approved schema hash matched, and the option1 2026 impact of -108.8B equals the TR2026 post-OBBBA baseline taxation-of-benefits total exactly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Fourteen reforms (option1-12, reverse_roth, tax93) across the every-fifth-year panel. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The ledger carries the v2pop panel's reservation and spawn records, the CSV allowlist covers TR2026 inputs plus the baseline diagnostics battery, the HI payroll guard checks the TR2026 ratio path, and the year-dataset build directories are ignored — their hashes live in the baseline manifest. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The diagram now shows the real flow — populace 2024, TR2026, CMS 2026, and CBO inputs feeding the four-stage yearly build with publication gates — and the prose explains demographics-in-weights versus economics-in-values, the 2075+ support augmentation, the TR2026 target set, and the fourteen-reform every-fifth-year panel. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

All 224 static cells validated in R2 with one pip-freeze hash; the ledger archives those spawn records and approves the 28 behavioral endpoint cells. Also adds the explainer-data builder for the general-audience dashboard rework. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The first behavioral launch died with the local modal client (DNS failure tore down the ephemeral app, killing all 28 workers before any artifact landed). Relaunch uses modal run --detach. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The durable artifact path is keyed by year and reform only, so a behavioral launch into the static panel's prefix would have overwritten its endpoint cells. The first behavioral launch was cancelled with all static artifacts verified intact; the ledger archives those records and approves the relaunch on a scoring-distinct prefix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The dataset_path tests now build a throwaway policyengine-us runtime and stamp its version and git SHA into the fixture metadata, matching the contract requirements that landed with the full-H5 publication work. The ledger re-approves the behavioral relaunch under the guard-hardened code bundle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The full 14-reform panel on the clone-free populace baselines, with the results contract carrying number-level lineage (224 exact cells trace to their no-clone baseline and scenario H5s by SHA-256; 826 interpolated display rows name their anchor years). Full repeal at 2026 is -108.8B, matching the TR2026 baseline taxation-of-benefits total exactly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The static panel, behavioral endpoints, and their display/metadata artifacts join the tracked release surface (results/ is otherwise ignored), and the explainer tooltip formatter matches recharts' Formatter signature so the production type-check passes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e guards The baseline-assumptions audit artifacts (aggregates, calibration targets/diagnostics, indexed parameters, metadata, public manifest) now carry the TR2026 current-law scenario id and the 1.700.2 runtime that built the datasets. Release-surface guards point at the v2pop panel and behavioral endpoints, the raw-match check expects 224 cells, and the income-tax realism band reflects the populace baseline (federal income tax ~8-11% of GDP, not v1's over-calibrated 25%+). Superseded 20260522 result CSVs are dropped from the tracked surface. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The baseline calibration audit and release package reference each year's .h5.metadata.json by path; tracking the small (10KB) sidecars while keeping the large H5 payloads ignored lets those references resolve in a clean CI checkout. Fixes the Python Guards failure where the package could not find the dataset metadata it cites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Scope moves to fourteen reforms (adding the reverse-Roth proposal and the 93% benchmark); the baseline becomes the 2026 Trustees current law, which carries OBBBA natively and removes the post-legislation bridge. The methods section now documents the populace base and the demographics-in-weights / economics-in-values construction (stages A-D, CBO growth capped at the Trustees GDP path), and the late-year section records that the clone-free populace base passes every far-horizon gate (contributor ESS 107-156) so no synthetic records are used. Exhibits regenerate from the 224-cell v2pop panel; the labor-supply exhibit reflects the now-published behavioral endpoints; 2026 Trustees and populace citations are added. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Records that headline cells were cross-checked against blind first- principles estimates, and that full benefit inclusion is broad-based across benefit-reliant retirees rather than concentrated at the top — a composition effect the microsimulation captures and an aggregate shortcut misses. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

For each of the fourteen reforms and sixteen modeled years, the change in household net income by baseline income decile, computed from the saved reform microdata against a baseline simulation (MicroDataFrame weighting only). The dashboard reform view gains a decile bar chart with a $/% toggle and a 2026-2100 year slider; non-modeled years interpolate client-side and are badged, matching the revenue path's display rule. The reverse-Roth U-shape and full repeal's monotone gradient are both legible. Builder, loader, component, and a structure test included. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Independent review caught three actionable issues: - The percent-change view divided by a non-positive baseline in the bottom decile, flipping the sign (full repeal's gain showed as a loss) and producing a -63% reverse-Roth outlier that blew out the axis. Percent change is now suppressed (null) where a decile's aggregate baseline net income is not positive; the chart drops those bars and notes the omission, and interpolation is null-aware. Max |pct| across all cells falls from 63.5% to 2.5%. - avg_change used a raw sum of the weight column; it now uses the MicroDataFrame weighted mean, honoring the weighting rule. - Hardened the client interpolation to join anchors by decile rather than position, and added tests for suppression, bounded percentages, and the reverse-Roth U-shape. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Second-pass review note: the real max across all cells is ~2.5%, so a 10% ceiling still catches a sign-flip regression with ample headroom. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The donor-clone far-horizon support machinery, and the pre-Bible modal_refresh/compute path it lived in, are fully superseded by the reform_full_h5 Bible path and back no live results. Delete compute.py, modal_refresh.py, run_modal_refresh.py, recover_modal_cells_run.py and their tests; drop the modal-refresh CLI command; excise the donor-gate block from runtime_config and its tests; remove the redundant no-clone baseline duplicate; and clean donor/clone references from the docs and paper. Published provenance (noclone run IDs, baseline manifest, donor_clone_household_count=0 sidecar field) and the PolicyEngine clone_system branch API are preserved. Also broaden a repro_freeze import guard from ModuleNotFoundError to ImportError so the full test suite collects cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…te stack

Reform panel now runs on the certified populace base (policyengine.py 4.17.5 / policyengine-us 1.729.0) via a single resumable Modal orchestrator. - modal_batch/run_panel.py: idempotent baseline build + reform scoring, keyed to src.selected_cells (annual 2026-2035 + 5-year + option12 junctures = 27y). - Behavioral (conventional) scoring is ENDPOINT-ONLY by construction (BEHAVIORAL_ENDPOINT_YEARS): 2026/2100 multipliers interpolated across years, never fanned out per year. - TOB OASDI/HI decomposition reuses reform_full_h5_worker.materialize_tob_revenue_pair (modal_batch/decomposition.py, endpoints) -> scripts/build_dashboard_results.py. - reforms.deduct_employee_social_security_payroll_tax returns a reform set (params + variable-only deduction) so LSR scoring no longer hits the pe-us 1.729.0 nested-from_dict failure. - Contributor support-gate floor lowered to 35 and applied to every projection year (no late-year carve-out); CONTRIBUTOR_GATE_START_YEAR removed. - Paper: operational sentinel exhibit removed (belongs in docs/ per Release Rule); number exhibits + dashboard results.csv regenerated on the new base. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per Codex adversarial review of 3dcdce5: - score_cell now refuses non-endpoint behavioral cells directly (the orchestrator spawn sites were guarded, but a direct call was the last unguarded path to the per-year LSR over-run). - Conventional reform dispatch coerces only a raw param dict via from_dict; a builder returning a Reform or reform-set tuple (reverse_roth) is used as-is, fixing the "'tuple' has no .items()" blocker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sion) Per Codex review: run_panel._assemble wrote results/reform_panel.json in a {"scoring": ...} schema, colliding with the canonical {"reforms": ...} panel that scripts/assemble_reform_panel.py writes and build_dashboard_results.py reads. Raw orchestrator output now goes to results/run_panel_raw.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per Codex review: `make data` and the README quick-start invoked the deleted scripts/generate_policy_impacts.py. Both now point at the Modal panel + assemble + build_dashboard_results flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add an Overview landing page (the new default) with a plain-language guide, the 'how Social Security benefits are taxed today' explainer, and clickable reform family cards that point to the sidebar. Move Methodology to its own About page and remove both it and the benefit-taxation explainer from every reform view, so reform pages open straight to the reform. Sidebar groups are now Overview / Benefit tax rules / Structural swaps / About (Methodology, Baseline model, paper). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MaxGhenis and others added 10 commits June 10, 2026 00:53

Add v2 every-5y launch runbook

4ab90e3

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Include reverse-Roth and tax93 in the every-5y launch plan

ce9eca0

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Align v2 method doc with the implemented pipeline

f89623f

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Format v2 modules with ruff

675bfde

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Fix stale TR2025 references in v2 docstrings

6544b7d

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis and others added 19 commits June 10, 2026 11:53

Store approved cells in the ledger dict form the guard parses

d620422

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Track the v2pop panel cell set in the live status builder

e058417

Fourteen reforms (option1-12, reverse_roth, tax93) across the every-fifth-year panel. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis and others added 30 commits June 13, 2026 06:06

Tighten the distributional percent-change regression bound to 10%

73650e7

Second-pass review note: the real max across all cells is ~2.5%, so a 10% ceiling still catches a sign-flip regression with ample headroom. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Clean CRFB release artifacts and distributional outputs

ceea4a2

Refresh CRFB share-facing provenance

f31406a

fixup! Remove dead clone/synthetic support and the legacy modal compu…

5e0a575

…te stack

fixup! Fix dead make-data target + README quick-start (deleted script)

7807b11

Consolidate CRFB result artifacts

6da1e67

Harden CRFB reform panel orchestration

de82fa7

Harden balanced fix recompute spec

c145e60

Use behavioral scoring labels throughout CRFB pipeline

5c683b0

Fail-close legacy CRFB scoring paths

a567429

Add balanced fix solvency baseline runner

7bf7ca7

Publish balanced fix dashboard results

1b81a4c

Fix 2100 HI payroll denominator for balanced fix

c23c0a6

Refresh baseline HI diagnostics after denominator fix

ee7aad3

Make SS solvent dashboard long-run only

042124c

Show empty early years for SS solvent charts

519a928

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 baselines on the 2026 Trustees Report; reverse-Roth and 93% options#130

V2 baselines on the 2026 Trustees Report; reverse-Roth and 93% options#130
MaxGhenis wants to merge 69 commits into
mainfrom
v2-baseline-method

MaxGhenis commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Jun 10, 2026

Summary

V2 baseline construction (src/v2_projection.py, src/v2_pipeline.py)

TR2026 targets (scripts/extract_tr2026_targets.py)

New reform options

Status

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

V2 baseline construction (`src/v2_projection.py`, `src/v2_pipeline.py`)

TR2026 targets (`scripts/extract_tr2026_targets.py`)

github-actions Bot commented Jun 10, 2026 •

edited

Loading