feat(shape): quantization-aware refinement + placeholder R-D study#1
Merged
Conversation
Opt-in SearchOptions.refine_passes (default 0) runs backfitting passes after the greedy fit across all four primitive modes: remove each shape, re-search against the shape-removed canvas, keep the replacement only when it lowers exact total SSE. The accept test renders wire-quantized params, so it judges decoder output -- a continuous-domain win can flip sign under 5-bit position / 4-bit radius / RGB565 quantization. Default 0 preserves byte-identical output (RNG stream untouched); the 125 byte-compat regression tests stay green. The refinement loop is shared via common::refine_shapes; each mode supplies search_<shape> + quantize_<shape>.
Reproducible scaffolding under scripts/paper/ (R-D benchmark, refinement ablation, entropy-coding headroom, dataset fetch) and the write-up in docs/RD_STUDY.md. Key findings: shape modes are Pareto-dominant on LPIPS/DISTS below ~300 B (circle-4 at 20 B beats blurhash-9x9 at 166 B); joint refinement improves PSNR but not perception, motivating a perceptual objective next; entropy-coding headroom is <5%. Image corpora and regenerated figures are git-ignored (CSVs already were).
Adds the weighted-objective PoC (scripts/paper/perceptual_poc.py) and its negative result to RD_STUDY.md: edge / center / saliency per-pixel weighting all fail to improve LPIPS on a uniform-RNG greedy circle fitter (best is strong saliency at -0.5% LPIPS for -0.65 dB PSNR, and unstable per image). With L2 refinement, perceptual weighting, and entropy coding all bounded, the study reframes around a single thesis -- sub-300-byte placeholders are limited by primitive expressiveness, not the objective or serialization -- and the paper takes a measurement positioning.
speed_benchmark.py measures pure encode latency per method on the Kodak thumbnails and joins each method's mean LPIPS from rd_results_kodak.csv. Key result (RD_STUDY §1.1): arthash shape modes own the fast-and-perceptual lower-left corner; arthash triangle-12 encodes 189x faster than SQIP (1.5 ms vs 284 ms) at ~20x smaller output — the integral-image hill-climb is what buys it. SQIP comes from the existing same-machine js_cross benchmark (its sharp dep is broken locally); blurhash's latency is its pure-Python reference impl and is not leaned on. Also ignore bench/*.png (reproducible figures) and bench/div2k/.
marwood_baseline.py reimplements Marwood et al. "Representing Images in 200 Bytes" (no official code exists): g×g grid vertices, implicit Delaunay connectivity, K-color palette vertex indices, Gouraud fill, error-driven greedy placement + palette coordinate descent, ideal-entropy byte model (generous to the baseline). Validated to the paper's magnitude at 221px/200B. Result (RD_STUDY §1.2): on Kodak, Marwood wins PSNR (MSE-optimal Gouraud mesh) but loses LPIPS decisively -- arthash triangle-12 matches Marwood's 187-byte LPIPS at 77 bytes (2.4x smaller) and wins at every rate. Gouraud smoothing discards the structure LPIPS rewards, same failure mode as blurhash. The split is the study's strongest evidence that PSNR is the wrong metric for sub-300-byte placeholders.
Complete first draft (paper/main.tex, IEEEtran conference) written from the RD_STUDY results: primitives Pareto-dominate industrial formats and a faithful Marwood reimpl on LPIPS/DISTS, the PSNR-vs-perception split, the 189x encode- latency edge over SQIP, and the three bounding ablations. Figures are staged from bench/ per paper/README.md (paper/figures/ git-ignored).
CITATION.cff (GitHub "Cite this repository") plus a Citation section in both READMEs pointing at docs/RD_STUDY.md, scripts/paper/, and the paper/ draft. main.tex gains a Limitations paragraph (single codec, two corpora, Marwood is our reimpl, SQIP's perceptual point inferred not measured, LPIPS/DISTS are proxies).
The TS wasm binding's parse_search built SearchOptions without the new refine_passes field, breaking `cargo build` for arthash-wasm (E0063). Mirror the PyO3 binding: read it from JS, default to the SearchOptions default.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AS IS
arthash's shape modes fit primitives greedily and never revisit a placed shape. Separately, the placeholder formats this repo competes with (blurhash / thumbhash / sqip) have no academic evaluation, and the sub-300-byte regime they live in has no rate-distortion characterization — so "shape modes look better" rests on a couple of PSNR spot checks (README Benchmarks), not a study.
TO BE
One research line, two deliverables:
Feature —
SearchOptions.refine_passes(default 0, byte-format-preserving). Optional quantization-aware joint refinement across all four primitive modes: remove each shape, re-search against the shape-removed canvas, keep the replacement only if it lowers exact total SSE under wire-quantized parameters (a continuous-domain win can flip sign after quantization). Shared viacommon::refine_shapes; each mode addssearch_<shape>+quantize_<shape>.Research scaffolding + write-up.
scripts/paper/(R-D benchmark, refinement ablation, entropy-coding headroom, weighted-objective PoC, encode-latency Pareto, a faithful Marwood ICIP'18 reimplementation, dataset fetch), the findings frozen indocs/RD_STUDY.md, and an ICIP draft inpaper/main.tex.Key results (Kodak + CLIC; PSNR / SSIM / LPIPS / DISTS): geometric primitives Pareto-dominate blurhash / thumbhash and a faithful Marwood reimpl on perceptual metrics — triangle-12 @77 B matches Marwood @187 B on LPIPS (2.4× smaller) and encodes 189× faster than SQIP. Three ablations (refinement, perceptual weighting, entropy coding) localize the bottleneck to primitive expressiveness, not the objective or the serialization.
Verification
refine_passes=0keeps output byte-identical (RNG stream untouched).cargo clippyclean.Blast radius: the only production-code change is the opt-in refinement; default-path encode/decode bytes are unchanged. Everything else is research tooling + docs (datasets and figures git-ignored).