perf(lexer): O(N^2) -> O(N) tokenizer — self-compile 320s -> 28s by zemo-g · Pull Request #16 · zemo-g/rail

zemo-g · 2026-06-16T03:55:35Z

The lexer used length cs == 0 as its empty-check; _rail_length walks the whole list, so the main loop tc (recursing per-token over remaining chars) was O(chars x tokens) ~2.7e10 on the 551K-char compile.rail = ~297s, 92% of every self-compile. Switched 10 empty-checks to == [] (O(1)). Byte-identical (same tokens).

Measured: self-compile 320s -> 28s (~11x). 177/177 tests. 2-pass fixed point byte-identical (reseeded). The lexer was always O(N^2); compile.rail growing to 551K chars made it dominant.

🤖 Generated with Claude Code

The lexer helpers used `length cs == 0` as their empty-check. _rail_length walks the whole list, so that test is O(remaining) PER recursion step. The main loop `tc` recurses once per token over the remaining char list, making it O(chars x tokens) ~ 2.7e10 on the 551K-char compile.rail -- ~297s, 92% of the entire self-compile. `rev` (over ~100K tokens), `strip_nl_pp_loop`, `lx_str`/`lx_col` (long string/word spans) compounded it. Switched 10 empty-checks (tc, rev_acc, strip_nl_pp_loop, lx_str, lx_col, lx_skip, lx_pk, htint_acc, has + one `length (tail cs) > 0`) to `== []` / `!= []` (O(1)). Byte-identical: same tokens out -> same parse -> same codegen. Measured: tokenize 297s -> sub-second; self-compile 320s -> 28s (~11x). 177/177 tests; 2-pass self-host fixed point byte-identical (rail_native reseeded). Not audit-resolve-specific -- the lexer was always O(N^2); compile.rail growing to 551K chars made it dominant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zemo-g merged commit 59ee1a8 into master Jun 16, 2026
3 checks passed

zemo-g deleted the fix/lexer-quadratic branch June 16, 2026 04:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lexer): O(N^2) -> O(N) tokenizer — self-compile 320s -> 28s#16

perf(lexer): O(N^2) -> O(N) tokenizer — self-compile 320s -> 28s#16
zemo-g merged 1 commit into
masterfrom
fix/lexer-quadratic

zemo-g commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zemo-g commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant