🔧 Rewire distribution_generator to external/latest catalog URLs#22
Open
paarriagadap wants to merge 33 commits into
Open
🔧 Rewire distribution_generator to external/latest catalog URLs#22paarriagadap wants to merge 33 commits into
paarriagadap wants to merge 33 commits into
Conversation
- Switch THOUSAND_BINS_URL, THOUSAND_BINS_HISTORICAL_URL and
THOUSAND_BINS_HISTORICAL__ALL_LOGNORMAL_URL to
external/poverty_inequality/latest/* (versionless).
- Switch the PIP percentiles and main-indicators reads from the legacy
wide-flat world_bank_pip_legacy tables to the dimensional
external/world_bank_pip/{percentiles,complete_series} tables, with a
small filter/merge block in run() to rebuild the flat shape the plot
code expects.
- Drop PIP_VERSION, THOUSAND_BINS_VERSION and THOUSAND_BINS_HISTORICAL_VERSION.
Underlying values are identical to the legacy tables when versions match
(spot-checked World 2020: mean=19.83, median=8.2, top1_thr=157.20,
decile9_thr=50.0, headcount_ratio_3000=82.98).
External historical_poverty and world_bank_pip URLs only become live once
owid/etl#6160 merges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New cut_percentile option on pen_parade (default 95): caps y above the
cut so the curve plateaus at the top, leaving room for labels on the
right. The p99 label moves to a top-of-chart annotation anchored at
x=cut_percentile when the cut is below 99.
- Replace default $-amount y-tick labels with the reference-line labels
themselves (IPL, World mean/median, p90, p99, country medians for
Norway/US/Sweden/UK), with per-label collision handling.
- Country median rows pulled from complete_series, preferring consumption
over income where both exist; pre-merged into df_main_indicators.
- New copy: "→ The richest 10% have an income of more than \$X per {period}"
and "↑ The richest 1% live on more than \$X per {period}".
- Dollar formatting: 2 decimals for daily values, 0 for monthly/yearly.
- Pen-parade figure aspect 1:1.25 (1000x1250), right margin reserved
for labels.
- Other pen_parade/disability_plots blocks temporarily skipped via
triple-quoted strings to iterate on pen_parade only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip the default so pen_parade plots the full distribution unchanged unless the caller passes cut_percentile. The World/month example call sets cut_percentile=95 explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Prepend a (x=0, y=0) row per hue group so each curve starts at the origin instead of at percentile 1. - When cut_percentile is set, extend the plateau to p100 by appending a (x=100, y=y_at_cut) anchor after the y-clamp. - Add a vertical white-to-transparent gradient band across the top of the chart (opaque at ~0.85*y_at_cut, clear at y_at_cut) so the line and fill dissolve into white instead of ending hard. - Move the p99 annotation to the right-margin column at the top-right axes corner so it aligns with the other y-tick labels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Make the fade piecewise: opaque white from the bottom of the band up through the plateau line at y_at_cut, fading to transparent only above it. This fully covers the faint plateau line that was peeking through the old linear gradient. - Stop the gradient at x=99.75 so it doesn't bleed over the right-hand y-axis line at x=100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- When cut_percentile is in effect, draw the right-hand y-axis line with an explicit plot() up to y_at_cut * 1.10 (with clip_on=False) so it reaches slightly above the cap, hinting that data continues above. - Bump the curve linewidth from the default to 2.5 for better presence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Flip the right-margin arrows from → to ← so they point into the chart. - Drop the World mean reference and add two new reference lines at the equivalent of \$900/month and \$500/month, defined in monthly terms and rescaled by period_factor / month_factor so the same real-world amounts shift correctly between day/month/year periods. - Centralize the reference-line color in a REFERENCE_LINE_COLOR constant (slate \#6c7a89) so all axhlines pick it up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Merge country-median pairs (currently Sweden+UK) whose medians are
within 5% into a single averaged line; assert if a configured pair
exceeds the tolerance so the chart author notices the divergence.
- If the merged country median is also within 5% of the world p90,
fold both into one label re-using the existing p90 axhline, no
duplicate line. Combined wording: "median income in {countries}, and
the income above which the richest 10% of the world live".
- Display-name map renders "United States" → "the USA" and
"United Kingdom" → "the UK" in the labels.
- Per-label anchor overrides position "median income in Sweden..." above
its tick and "richest 10%..." below (no-op when both fragments are
merged into one label).
- All reference lines dotted (linestyle=":"), drop the old high-income
poverty-line block and "poorest 50%" narrative.
- Reword the IPL and World median labels to the same shape as the others:
"← $X per {period}" / "← $X per {period} — the global median income".
- $900/month and $500/month reference lines defined in monthly terms and
rescaled by period_factor / month_factor.
- Pen parade now 1000x1000 (square), wider right margin (right=0.55),
wrap_width=36, curve linewidth=2.5, slate-blue reference color via a
central REFERENCE_LINE_COLOR constant.
- Fade band on the top tail of the chart is piecewise (opaque through
the plateau line, fades only above), with y_at_fade_floor at
y_at_cut * 0.88 by default. Right-hand y-axis line extends to
y_at_cut * 1.10 with clip_on=False, hinting at data continuing above.
- World pen parade call passes cut_percentile=95.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous setup positioned the labels via tick label offsets, but matplotlib's tick-label renderer ignores transform offsets so the per-line-count vertical nudge for multi-line wrapped labels never actually applied (the ← arrow was misaligned with the tick). Hide the default tick labels (set_yticklabels with empty strings) and place each label manually with ax.annotate, using xytext in offset points so the per-n offset is honored. Default labels: va="center" with offset -(n-1)*line_height/2 so the first line lines up with the tick for both single- and multi-line wrapped labels. The above/below overrides (Sweden+UK, richest 10%) use va="bottom" / va="top" with no offset. Collision detection is disabled for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a small red square bracket above IPL, the global median, and the
high-income poverty line, spanning from x=0 out to where the curve
crosses that y value. Marks the share of the world earning less than
each threshold. Bracket color matches the deep-red from
sns.color_palette("deep")[3]; height ~1.2% of the visible y-range.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce an axhline_over_curve helper that draws the dotted reference line only from where the curve crosses that y value out to the right edge (xmin = x_crossing / 100 in axes fraction). Apply it to all six reference lines (IPL, \$900, \$500, World median, p90, p99, country medians, merged Sweden+UK groups). The white space above the curve — where the red square brackets live — now stays clean of dotted lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add three labeled annotations above the IPL, global median, and high- income poverty brackets, each in 2-3 lines via VPacker / TextArea / AnnotationBbox so the bold title sits above the regular-weight body and both share a right edge anchored at the bracket's right tip. - Poverty: \$900-bracket gets the World headcount share at \$900/month (headcount_ratio_3000 pulled from complete_series and merged into df_main_indicators). - Deep poverty: global-median bracket gets the "poorer half of the world — 4 billion people — live on less than \$X per period" copy in three lines. - Extreme poverty: IPL bracket gets the World headcount share at the IPL ($3/day → poverty_line "300", new df_pov_ipl merge) in three lines, right-aligned at the bracket end. Also add PPP_VERSION = 2021 constant and route df_complete filters through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Shade the area under the curve below each poverty line in the same
blue as the main fill (alpha 0.3, stacked) so the IPL / median /
high-income bands deepen progressively.
- Annotation copy: spaced em dashes used consistently; "Deep poverty"
reads "The poorer half of the world population — 4 billion people —
live on less than \$X per period".
- Right-align the headers ("Poverty", "Deep poverty", "Extreme
poverty") via ha="right"/multialignment="right" so each block has a
single clean right edge across the bold title and the body lines.
- Anchor the AnnotationBbox's right edge at the bracket's left edge
(box_alignment=(1.0, 0.0)) with xybox=(-4, 0) so the label's right
edge sits just before the bracket — text → bracket flow without an
external gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Guard the (x=0, y=0) per-hue anchor row behind `not log_scale` so log-scaled pen parades (Chile/Peru/Uruguay) actually render again — log(0) = -inf would otherwise blank the line geometry. - Split the Poverty annotation into 3 lines (bold title, then "X% of the world population" + "live on less than \$Y per period") to match the Deep/Extreme poverty annotations' shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- styled_annotation now wraps the body text at wrap_width=32 chars via textwrap.fill, so callers don't need to embed \\n breaks manually. - All annotation boxes use box_alignment=(0.0, 0.0) with a uniform xybox=(-130, 0) leftward offset, so every block's LEFT edge sits at the same x position regardless of line count or text length. - Drop the manual line splits from the three annotation call sites (text passed as one logical sentence each). - Black-format the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nd fills Replaces the integer-percentile snap (above[x].min()) with a shared interpolate_x_at_y helper, so red bracket end-caps and dotted reference lines meet the curve exactly. Adds interpolate=True to KDE area fills.
Use raw (unrounded) reference values for positioning so the world median bracket lands at p=50 and matches the rounded $289 shown in labels. Build each poverty-band fill polygon with the exact crossing point appended, because matplotlib's interpolate=True doesn't apply when y1=0 is constant.
…ti-year plots - Replace explicit x_axis_range tuple with share_x_axis: bool that auto-computes the union x-range across years in a pre-pass. - Add share_y_axis: bool that locks every per-year SVG to the same peak density. - Add row_by="year" to distributional_plots_per_row for one-country / multi-year stacked layouts. - Add filename_suffix to distinguish the Sweden lognormal output set.
Previously share_x_axis set xlim = union of data extents, which clipped the KDE taper on the right. Now extend each year's bounds by cut*bw in log10 space (matching seaborn's default cut=3 and Scott's bandwidth) before taking the union, so the axis right edge lands exactly where the curve tapers to near-zero. Tick filter still caps visible labels at the largest log tick inside that range — for Sweden, axis runs to $263 but the last label is $200.
…SVGs When share_x_axis or share_y_axis is on, switch from bbox_inches="tight" (which varies the SVG bounding box per figure based on visible content) to fixed subplots_adjust margins, so per-year SVGs can be stacked pixel-aligned.
Replaced by the 2026-data outputs.
…abels to integers Removes the add_world_mean parameter, the world_mean_year computation, and the line/label/area branches from distributional_plots, distributional_plots_per_row, and pen_parade (where it was computed but never displayed). Also rounds reference line labels via dollar_decimals (2 for day, 0 for month/year) to match the pen parade style.
… IPL/high-income/world-median controls - Add `add_high_income_pl`, `add_world_median` (per-year via df_main_indicators) and `filename_suffix` to distributional_plots_per_row. - Forward fill, add_ipl, add_multiple_lines_day to the year-rows helper, and swap the axvline loop for draw_area_under_curve so reference thresholds become shaded regions. - Draw constant-x reference lines (IPL, high-income, world-median when shared) with a figure-spanning Line2D in a blended transform so the line is continuous across stacked subplots (the per-axes axvline left a visible gap between rows). - Share y across rows in distributional_plots_per_row(row_by="country"). - Apply share_x_axis-style range, KDE clip, set_xlim, and tick filtering inside the year-rows helper.
Pick between hanging the rotated reference-line label inside axes[0] from its top edge (when the curve at the label's x leaves ≥50% headroom — e.g. Sweden 1820 at the IPL) or floating it just above axes[0] in the figure margin (when the curve fills axes[0] — e.g. Ethiopia at the IPL). Drops the previous "always above figure margin" placement that felt too far from the chart for low-density rows. Also: World mean kwarg removed, reference-line defaults flipped to None, scipy imports cleaned up.
…s years The per-year loop in distributional_plots was multiplying add_multiple_lines_day by period_factor in place. On the second year the values were already in period units and got multiplied again, pushing both threshold fills past the data extent so they collapsed into the same polygon. Use a fresh local `scaled_lines` per iteration. Also: revert add_ipl/add_world_median defaults to "line" and set them explicitly to None for the Sweden lognormal separate call so the line stays out of those charts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewires PabloArriagada/distribution_generator/distribution_generator.py
to load its inputs from the public
external/poverty_inequality/latest/...catalog paths instead ofversioned
garden/wb/.../world_bank_pip_legacyandgarden/poverty_inequality/.../historical_povertyURLs.thousand_bins_interpolated_ginis,thousand_bins_interpolated_ginis_all_lognormal)point at the new
external/poverty_inequality/latest/historical_povertystep added in owid/etl#6160.external/poverty_inequality/latest/thousand_bins_distribution.external/poverty_inequality/latest/world_bank_pip/{percentiles,complete_series}tables (also added in 📊 Add poverty/inequality external steps and region columns etl#6160).A small filter/merge block in
run()rebuilds the per-(country, year) flat shape that the existing plotfunctions expect. Underlying values are identical to the legacy tables when versions match — spot-checked
World 2020:
mean=19.83,median=8.2,top1_thr=157.20,decile9_thr=50.0,headcount_ratio_3000=82.98.NATIONAL_LINES_URL(harmonized_national_poverty_lines) stays ongarden/...because there's no externalmirror for it.
Net effect: three pinned version constants disappear (
PIP_VERSION,THOUSAND_BINS_VERSION,THOUSAND_BINS_HISTORICAL_VERSION); onlyNATIONAL_LINES_VERSIONremains.Test plan
```bash
uv run python PabloArriagada/distribution_generator/distribution_generator.py
```