Skip to content

feat(shape): quantization-aware refinement + placeholder R-D study#1

Merged
Jannchie merged 8 commits into
mainfrom
feature/joint-refinement
Jun 13, 2026
Merged

feat(shape): quantization-aware refinement + placeholder R-D study#1
Jannchie merged 8 commits into
mainfrom
feature/joint-refinement

Conversation

@Jannchie

Copy link
Copy Markdown
Owner

AS IS

arthash's shape modes fit primitives greedily and never revisit a placed shape. Separately, the placeholder formats this repo competes with (blurhash / thumbhash / sqip) have no academic evaluation, and the sub-300-byte regime they live in has no rate-distortion characterization — so "shape modes look better" rests on a couple of PSNR spot checks (README Benchmarks), not a study.

TO BE

One research line, two deliverables:

  1. Feature — SearchOptions.refine_passes (default 0, byte-format-preserving). Optional quantization-aware joint refinement across all four primitive modes: remove each shape, re-search against the shape-removed canvas, keep the replacement only if it lowers exact total SSE under wire-quantized parameters (a continuous-domain win can flip sign after quantization). Shared via common::refine_shapes; each mode adds search_<shape> + quantize_<shape>.

  2. Research scaffolding + write-up. scripts/paper/ (R-D benchmark, refinement ablation, entropy-coding headroom, weighted-objective PoC, encode-latency Pareto, a faithful Marwood ICIP'18 reimplementation, dataset fetch), the findings frozen in docs/RD_STUDY.md, and an ICIP draft in paper/main.tex.

Key results (Kodak + CLIC; PSNR / SSIM / LPIPS / DISTS): geometric primitives Pareto-dominate blurhash / thumbhash and a faithful Marwood reimpl on perceptual metrics — triangle-12 @77 B matches Marwood @187 B on LPIPS (2.4× smaller) and encodes 189× faster than SQIP. Three ablations (refinement, perceptual weighting, entropy coding) localize the bottleneck to primitive expressiveness, not the objective or the serialization.

Verification

  • 125 byte-compat regression tests green; refine_passes=0 keeps output byte-identical (RNG stream untouched).
  • cargo clippy clean.
  • Marwood reimpl validated to the paper's magnitude (221 px / ~200 B → ~24 dB on simple content).

Blast radius: the only production-code change is the opt-in refinement; default-path encode/decode bytes are unchanged. Everything else is research tooling + docs (datasets and figures git-ignored).

Jannchie added 7 commits June 13, 2026 16:39
Opt-in SearchOptions.refine_passes (default 0) runs backfitting passes
after the greedy fit across all four primitive modes: remove each shape,
re-search against the shape-removed canvas, keep the replacement only when
it lowers exact total SSE. The accept test renders wire-quantized params,
so it judges decoder output -- a continuous-domain win can flip sign under
5-bit position / 4-bit radius / RGB565 quantization.

Default 0 preserves byte-identical output (RNG stream untouched); the 125
byte-compat regression tests stay green. The refinement loop is shared via
common::refine_shapes; each mode supplies search_<shape> + quantize_<shape>.
Reproducible scaffolding under scripts/paper/ (R-D benchmark, refinement
ablation, entropy-coding headroom, dataset fetch) and the write-up in
docs/RD_STUDY.md. Key findings: shape modes are Pareto-dominant on
LPIPS/DISTS below ~300 B (circle-4 at 20 B beats blurhash-9x9 at 166 B);
joint refinement improves PSNR but not perception, motivating a perceptual
objective next; entropy-coding headroom is <5%.

Image corpora and regenerated figures are git-ignored (CSVs already were).
Adds the weighted-objective PoC (scripts/paper/perceptual_poc.py) and its
negative result to RD_STUDY.md: edge / center / saliency per-pixel weighting
all fail to improve LPIPS on a uniform-RNG greedy circle fitter (best is
strong saliency at -0.5% LPIPS for -0.65 dB PSNR, and unstable per image).

With L2 refinement, perceptual weighting, and entropy coding all bounded,
the study reframes around a single thesis -- sub-300-byte placeholders are
limited by primitive expressiveness, not the objective or serialization --
and the paper takes a measurement positioning.
speed_benchmark.py measures pure encode latency per method on the Kodak
thumbnails and joins each method's mean LPIPS from rd_results_kodak.csv.
Key result (RD_STUDY §1.1): arthash shape modes own the fast-and-perceptual
lower-left corner; arthash triangle-12 encodes 189x faster than SQIP
(1.5 ms vs 284 ms) at ~20x smaller output — the integral-image hill-climb is
what buys it. SQIP comes from the existing same-machine js_cross benchmark
(its sharp dep is broken locally); blurhash's latency is its pure-Python
reference impl and is not leaned on.

Also ignore bench/*.png (reproducible figures) and bench/div2k/.
marwood_baseline.py reimplements Marwood et al. "Representing Images in 200
Bytes" (no official code exists): g×g grid vertices, implicit Delaunay
connectivity, K-color palette vertex indices, Gouraud fill, error-driven
greedy placement + palette coordinate descent, ideal-entropy byte model
(generous to the baseline). Validated to the paper's magnitude at 221px/200B.

Result (RD_STUDY §1.2): on Kodak, Marwood wins PSNR (MSE-optimal Gouraud
mesh) but loses LPIPS decisively -- arthash triangle-12 matches Marwood's
187-byte LPIPS at 77 bytes (2.4x smaller) and wins at every rate. Gouraud
smoothing discards the structure LPIPS rewards, same failure mode as blurhash.
The split is the study's strongest evidence that PSNR is the wrong metric for
sub-300-byte placeholders.
Complete first draft (paper/main.tex, IEEEtran conference) written from the
RD_STUDY results: primitives Pareto-dominate industrial formats and a faithful
Marwood reimpl on LPIPS/DISTS, the PSNR-vs-perception split, the 189x encode-
latency edge over SQIP, and the three bounding ablations. Figures are staged
from bench/ per paper/README.md (paper/figures/ git-ignored).
CITATION.cff (GitHub "Cite this repository") plus a Citation section in both
READMEs pointing at docs/RD_STUDY.md, scripts/paper/, and the paper/ draft.
main.tex gains a Limitations paragraph (single codec, two corpora, Marwood is
our reimpl, SQIP's perceptual point inferred not measured, LPIPS/DISTS are
proxies).
The TS wasm binding's parse_search built SearchOptions without the new
refine_passes field, breaking `cargo build` for arthash-wasm (E0063).
Mirror the PyO3 binding: read it from JS, default to the SearchOptions default.
@Jannchie Jannchie merged commit f529288 into main Jun 13, 2026
19 of 20 checks passed
@Jannchie Jannchie deleted the feature/joint-refinement branch June 13, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant