Skip to content

V2 baselines on the 2026 Trustees Report; reverse-Roth and 93% options#130

Draft
MaxGhenis wants to merge 69 commits into
mainfrom
v2-baseline-method
Draft

V2 baselines on the 2026 Trustees Report; reverse-Roth and 93% options#130
MaxGhenis wants to merge 69 commits into
mainfrom
v2-baseline-method

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Summary

Rebuilds the long-horizon baseline datasets with a new division of labor — demographics in weights, economics in values — calibrated to the 2026 Trustees Reports (released June 9, 2026), and adds the two reform options from the May 22 CRFB meeting.

V2 baseline construction (src/v2_projection.py, src/v2_pipeline.py)

Per projection year: materialize the latest enhanced CPS at the target year → light entropy reweight to the Trustees age distribution → rescale values to Trustees aggregates (α: taxable payroll, cap-aware; β: OASDI cost; best-effort γ: beneficiary other income toward the TOB target) → donor-clone late-year support from 2075 (deterministic jittered clones of real contributor households) → final light entropy calibration hitting age, benefits, payroll, and the TR2026 OASDI/HI taxation-of-benefits series exactly, with v1-style income guards from 2075. Build-time enforcement of the documented publication gates; stage-by-stage sentinel diagnostics; per-year metadata sidecars.

Fixes three structural issues in the v1 datasets: the population-growth double count in aggregate-growth uprating (β ≈ −1.9%/yr corrects it), the weight-tilt concentration of the TOB base (Mark Sarney's "something is wrong with the 2100 data"), and pre-OBBBA targets under post-OBBBA law.

TR2026 targets (scripts/extract_tr2026_targets.py)

Section-aware extraction of intermediate-assumption series: OASDI cost (IV.B1 cost rate × VI.G1 payroll, since VI.G2 truncates at reserve depletion), taxable payroll and GDP (VI.G1), OASDI TOB (IV.B2 % × payroll), HI TOB (CMS 2026 Medicare expanded tables, annual through 2100). TR2026 includes OBBBA in current law, so the TR2025 post-OBBBA bridge (OACT letter deltas + provisional HI scaling) is retired. Population targets are interim: TR2026 V.A3 group totals on the TR2024 single-year-age shape until SSA posts the single-year file. Dashboard denominators and trust-fund gap rates refreshed to TR2026.

New reform options

  • reverse_roth (May 22 CRFB meeting, Sarney/Colavito): tax 100% of Social Security benefits immediately and make employee OASDI payroll taxes deductible above the line; Medicare unchanged.
  • tax93: taxes 93% of benefits — the Goss-analysis share CRFB planned to interpolate between the 90% and 95% options, run exactly instead.

Both are registered for static and behavioral scoring and excluded from published results until full reform H5 cells exist under the production contract.

Status

  • 51 unit tests pass (tests/test_v2_projection.py, tests/test_reforms.py, CI guards)
  • Dashboard builds and lints clean
  • 16-year dataset build on TR2026 targets (in progress)
  • Local proof → paid sentinel → 224-cell static panel + 28 behavioral endpoint cells per docs/current/v2-launch-runbook.md
  • Results aggregation, dashboard/paper refresh

🤖 Generated with Claude Code

MaxGhenis and others added 10 commits June 10, 2026 00:53
Per-year construction: materialize the latest enhanced CPS at the target
year, lightly entropy-reweight to the Trustees age distribution, rescale
values to Trustees aggregates (alpha for taxable payroll, beta for OASDI
cost, best-effort gamma for taxation-of-benefits other income), append
jittered donor clones of real contributor households from 2075, then run
a final light entropy calibration to age, Social Security, payroll, and
post-OBBBA TOB targets with v1-style income guards. Calibrating to the
post-OBBBA TOB series removes the pre/post-OBBBA mismatch carried by the
v1 datasets; value-level scaling removes the population-growth double
count in aggregate uprating series.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Also: best-effort gamma (TOB is nearly inelastic to other income at far
horizons once 85% inclusion saturates), donor-clone late-year support
with deterministic jitter, v1-style self-referential income guards from
2075, and per-year contributor gates matching the documented runtime
defaults.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reverse Roth (per the May 22 CRFB meeting with Mark Sarney and Anthony
Colavito): immediately tax 100% of Social Security benefits and make
employee OASDI payroll taxes deductible above the line, leaving Medicare
on its current Roth-style basis. Ports the implementation drafted on the
add-reverse-roth-proposal worktree.

tax93: taxes 93% of benefits, the share Steve Goss's SSA analysis
attributed to employer contributions and earnings — an exact run of the
number CRFB planned to interpolate between the 90% and 95% options.

Both are runnable reform IDs and excluded from published results until
full reform H5 cells exist under the production contract.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
TR2026 (released June 9, 2026) incorporates OBBBA in current law, so its
taxation-of-benefits series replaces the TR2025 post-OBBBA bridge
entirely: OASDI TOB from IV.B2 percent of taxable payroll times VI.G1
payroll, HI TOB from the CMS 2026 Medicare Trustees expanded tables
(annual through 2100, no carry-forward). OASDI cost comes from the IV.B1
cost rate because the VI.G2 dollar table truncates at reserve depletion.
All series use intermediate assumptions; the extractor is section-aware
so alternatives cannot leak in.

Population targets are interim: TR2026 V.A3 group totals (under 20 /
20-64 / 65+, capturing the fertility and immigration revisions) applied
to the TR2024 single-year-age shape until SSA posts the TR2026
single-year file.

Also re-solve the earnings and benefit scales after donor-clone
augmentation so clone mass cannot push taxable payroll off target.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Taxable payroll and GDP from VI.G1, OASDI gap rates from IV.B1, HI gap
rates from the CMS expanded tables. HI taxable payroll scales the
TR2026 OASDI payroll by the TR2025 HI/OASDI ratio path because the
Medicare expanded tables do not publish HI payroll levels.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

MaxGhenis and others added 19 commits June 10, 2026 11:53
…inel

The new baseline-diagnostics battery caught gamma 1.62 pushing 2026 AGI
to 155% of GDP while chasing the higher TR2026 HI taxation-of-benefits
target. Gamma is now a bounded calibration nudge; broader donor support
(3,000 donors x 6 jittered clones) carries the late-year residual, and
every build logs AGI and income tax as shares of GDP, warning when AGI
exceeds GDP.

The dashboard baseline tab gains a 'Baseline trajectories through 2100'
section: twelve small-multiple panels showing every calibrated series
against its TR2026 target plus the uncalibrated by-products (income
tax, AGI, beneficiary and worker counts vs TR2026 IV.B4, statutory
payroll-tax checks, income components), each badged calibrated or
diagnostic, with a spotlight table and download link. Data comes from
scripts/build_v2_baseline_diagnostics.py, which simulates every built
year dataset and attaches targets and references.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The 2026 diagnostic decomposition showed miscellaneous and partnership
income roughly doubling through the weight tilt alone: without guards,
the entropy solve closes taxation-of-benefits residuals by upweighting
households heavy in the most concentrated income types. The guards pin
ordinary non-payroll and preferential investment income totals at their
post-value-scaling levels in every year, not just from 2075.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The published enhanced CPS stores miscellaneous_income summing to
$9.3T weighted (real-world total is near $100B), with dozens of person
records pinned at exactly $795,294,848 — an imputation top-code
artifact, present in both the current and v1-era Hugging Face
revisions. Uprated across the 75-year horizon and amplified by donor
cloning (the pinned records maximize provisional income and therefore
dominate the TOB contributor pool), these records pushed late-year AGI
to several times GDP. Values above $10M per person are unambiguously
corrupt and are zeroed at materialization, with the repair logged and
stamped into each year's metadata. The upstream fix is tracked in
policyengine-us-data.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The SOI uprating extensions grow several income categories near 5% per
year forever against TR2026 nominal GDP near 3.5%, so AGI outruns GDP
at far horizons (132% by the 2090s even after the miscellaneous-income
repair). CBO-vintage growth stands through 2034; from 2035 each
category's cumulative growth is capped at the GDP path, per the project
direction that non-targeted income follows the CBO forecast and similar
growth thereafter to the extent it does not contradict the Trustees.
Donor support widens to 4,000 donors x 8 clones for the late-year
taxation-of-benefits contributor gates.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A formatting-mismatched patch added cap_longrun_income_growth and its
metadata reference without the call, so every year crashed at metadata
assembly (NameError) or kept uncapped growth. Verified against real
parameters: qualified dividends at 2095 cap at x0.46, interest stays
untouched because it grows slower than GDP. A regression test pins the
call site.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Populace stores 'unknown' race codes that the strict set_input
round-trip rejects even though the native loader tolerates them.
Invalid enum values coerce to each variable's default, logged and
stamped into metadata.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Populace ships social_security alongside its four components; keeping
the stored aggregate would shadow the adds formula and break the
benefit-scaling stage. The frame now keeps components and lets
aggregates recompute, covering both the populace layout and the
enhanced-CPS behavioral-response case. Populace is also pinned to a
stable project path because the Hugging Face cache evicted the snapshot
mid-build.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Populace ships pre-uprated columns for 2024-2035 per variable; only the
base period belongs in the frame so the pipeline stays the single
uprating authority and out-year aggregates cannot bypass the value
stages (the leak guard caught social_security__2026 doing exactly
that).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
One paid cell (2026/option1/static) on the populace baselines under run
prefix v2pop_tr2026_20260611, per Max's chat authorizations of June
9-11. Includes the baseline-dataset manifest (16 years, populace base
pinned to populace-us-2024-9f1260b-20260611), the expected schema
manifest from the free local proof, and the 16-year baseline
diagnostics battery (income tax 7.8-10.7% of GDP, AGI 57-65%, all
gates passed).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The per-run spawn guard compares run-scoped counters against the
spawned-call list, which still carried the 24 v1 behavioral-endpoint
records; the first v2pop attempt consumed its nonce at that guard
before any paid call was created. History is preserved under
previous_run_call_history.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sentinel verified end to end: durable R2 artifacts, pre-approved schema
hash matched, and the option1 2026 impact of -108.8B equals the TR2026
post-OBBBA baseline taxation-of-benefits total exactly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fourteen reforms (option1-12, reverse_roth, tax93) across the
every-fifth-year panel.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The ledger carries the v2pop panel's reservation and spawn records, the
CSV allowlist covers TR2026 inputs plus the baseline diagnostics
battery, the HI payroll guard checks the TR2026 ratio path, and the
year-dataset build directories are ignored — their hashes live in the
baseline manifest.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The diagram now shows the real flow — populace 2024, TR2026, CMS 2026,
and CBO inputs feeding the four-stage yearly build with publication
gates — and the prose explains demographics-in-weights versus
economics-in-values, the 2075+ support augmentation, the TR2026 target
set, and the fourteen-reform every-fifth-year panel.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All 224 static cells validated in R2 with one pip-freeze hash; the
ledger archives those spawn records and approves the 28 behavioral
endpoint cells. Also adds the explainer-data builder for the
general-audience dashboard rework.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The first behavioral launch died with the local modal client (DNS
failure tore down the ephemeral app, killing all 28 workers before any
artifact landed). Relaunch uses modal run --detach.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The durable artifact path is keyed by year and reform only, so a
behavioral launch into the static panel's prefix would have overwritten
its endpoint cells. The first behavioral launch was cancelled with all
static artifacts verified intact; the ledger archives those records and
approves the relaunch on a scoring-distinct prefix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The dataset_path tests now build a throwaway policyengine-us runtime
and stamp its version and git SHA into the fixture metadata, matching
the contract requirements that landed with the full-H5 publication
work. The ledger re-approves the behavioral relaunch under the
guard-hardened code bundle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MaxGhenis and others added 30 commits June 13, 2026 06:06
The full 14-reform panel on the clone-free populace baselines, with the
results contract carrying number-level lineage (224 exact cells trace
to their no-clone baseline and scenario H5s by SHA-256; 826 interpolated
display rows name their anchor years). Full repeal at 2026 is -108.8B,
matching the TR2026 baseline taxation-of-benefits total exactly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The static panel, behavioral endpoints, and their display/metadata
artifacts join the tracked release surface (results/ is otherwise
ignored), and the explainer tooltip formatter matches recharts'
Formatter signature so the production type-check passes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e guards

The baseline-assumptions audit artifacts (aggregates, calibration
targets/diagnostics, indexed parameters, metadata, public manifest) now
carry the TR2026 current-law scenario id and the 1.700.2 runtime that
built the datasets. Release-surface guards point at the v2pop panel and
behavioral endpoints, the raw-match check expects 224 cells, and the
income-tax realism band reflects the populace baseline (federal income
tax ~8-11% of GDP, not v1's over-calibrated 25%+). Superseded 20260522
result CSVs are dropped from the tracked surface.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The baseline calibration audit and release package reference each year's
.h5.metadata.json by path; tracking the small (10KB) sidecars while
keeping the large H5 payloads ignored lets those references resolve in a
clean CI checkout. Fixes the Python Guards failure where the package
could not find the dataset metadata it cites.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Scope moves to fourteen reforms (adding the reverse-Roth proposal and
the 93% benchmark); the baseline becomes the 2026 Trustees current law,
which carries OBBBA natively and removes the post-legislation bridge.
The methods section now documents the populace base and the
demographics-in-weights / economics-in-values construction (stages A-D,
CBO growth capped at the Trustees GDP path), and the late-year section
records that the clone-free populace base passes every far-horizon gate
(contributor ESS 107-156) so no synthetic records are used. Exhibits
regenerate from the 224-cell v2pop panel; the labor-supply exhibit
reflects the now-published behavioral endpoints; 2026 Trustees and
populace citations are added.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Records that headline cells were cross-checked against blind first-
principles estimates, and that full benefit inclusion is broad-based
across benefit-reliant retirees rather than concentrated at the top — a
composition effect the microsimulation captures and an aggregate
shortcut misses.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
For each of the fourteen reforms and sixteen modeled years, the change
in household net income by baseline income decile, computed from the
saved reform microdata against a baseline simulation (MicroDataFrame
weighting only). The dashboard reform view gains a decile bar chart with
a $/% toggle and a 2026-2100 year slider; non-modeled years interpolate
client-side and are badged, matching the revenue path's display rule.
The reverse-Roth U-shape and full repeal's monotone gradient are both
legible. Builder, loader, component, and a structure test included.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Independent review caught three actionable issues:
- The percent-change view divided by a non-positive baseline in the
  bottom decile, flipping the sign (full repeal's gain showed as a loss)
  and producing a -63% reverse-Roth outlier that blew out the axis.
  Percent change is now suppressed (null) where a decile's aggregate
  baseline net income is not positive; the chart drops those bars and
  notes the omission, and interpolation is null-aware. Max |pct| across
  all cells falls from 63.5% to 2.5%.
- avg_change used a raw sum of the weight column; it now uses the
  MicroDataFrame weighted mean, honoring the weighting rule.
- Hardened the client interpolation to join anchors by decile rather
  than position, and added tests for suppression, bounded percentages,
  and the reverse-Roth U-shape.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Second-pass review note: the real max across all cells is ~2.5%, so a
10% ceiling still catches a sign-flip regression with ample headroom.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The donor-clone far-horizon support machinery, and the pre-Bible
modal_refresh/compute path it lived in, are fully superseded by the
reform_full_h5 Bible path and back no live results. Delete compute.py,
modal_refresh.py, run_modal_refresh.py, recover_modal_cells_run.py and
their tests; drop the modal-refresh CLI command; excise the donor-gate
block from runtime_config and its tests; remove the redundant no-clone
baseline duplicate; and clean donor/clone references from the docs and
paper. Published provenance (noclone run IDs, baseline manifest,
donor_clone_household_count=0 sidecar field) and the PolicyEngine
clone_system branch API are preserved.

Also broaden a repro_freeze import guard from ModuleNotFoundError to
ImportError so the full test suite collects cleanly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reform panel now runs on the certified populace base (policyengine.py 4.17.5 /
policyengine-us 1.729.0) via a single resumable Modal orchestrator.

- modal_batch/run_panel.py: idempotent baseline build + reform scoring, keyed
  to src.selected_cells (annual 2026-2035 + 5-year + option12 junctures = 27y).
- Behavioral (conventional) scoring is ENDPOINT-ONLY by construction
  (BEHAVIORAL_ENDPOINT_YEARS): 2026/2100 multipliers interpolated across years,
  never fanned out per year.
- TOB OASDI/HI decomposition reuses reform_full_h5_worker.materialize_tob_revenue_pair
  (modal_batch/decomposition.py, endpoints) -> scripts/build_dashboard_results.py.
- reforms.deduct_employee_social_security_payroll_tax returns a reform set
  (params + variable-only deduction) so LSR scoring no longer hits the
  pe-us 1.729.0 nested-from_dict failure.
- Contributor support-gate floor lowered to 35 and applied to every projection
  year (no late-year carve-out); CONTRIBUTOR_GATE_START_YEAR removed.
- Paper: operational sentinel exhibit removed (belongs in docs/ per Release
  Rule); number exhibits + dashboard results.csv regenerated on the new base.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per Codex adversarial review of 3dcdce5:
- score_cell now refuses non-endpoint behavioral cells directly (the
  orchestrator spawn sites were guarded, but a direct call was the last
  unguarded path to the per-year LSR over-run).
- Conventional reform dispatch coerces only a raw param dict via from_dict;
  a builder returning a Reform or reform-set tuple (reverse_roth) is used
  as-is, fixing the "'tuple' has no .items()" blocker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sion)

Per Codex review: run_panel._assemble wrote results/reform_panel.json in a
{"scoring": ...} schema, colliding with the canonical {"reforms": ...} panel
that scripts/assemble_reform_panel.py writes and build_dashboard_results.py
reads. Raw orchestrator output now goes to results/run_panel_raw.json.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per Codex review: `make data` and the README quick-start invoked the deleted
scripts/generate_policy_impacts.py. Both now point at the Modal panel +
assemble + build_dashboard_results flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an Overview landing page (the new default) with a plain-language guide, the 'how Social Security benefits are taxed today' explainer, and clickable reform family cards that point to the sidebar. Move Methodology to its own About page and remove both it and the benefit-taxation explainer from every reform view, so reform pages open straight to the reform. Sidebar groups are now Overview / Benefit tax rules / Structural swaps / About (Methodology, Baseline model, paper).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant