[Go] Skip territory flood fill for non-terminal states (3-12x faster rewards); int8 board history by gweber · Pull Request #1321 · sotetsuk/pgx

gweber · 2026-06-10T13:15:39Z

Problem

Game.rewards() computes Tromp-Taylor scores on every step, but the result is discarded for non-terminal states (lax.select(self.is_terminal(state), rewards, zeros)). Under vmap/jit the discarded computation is not free: the territory flood fill (lax.while_loop in _count_ji) runs as many rounds as the worst board in the batch needs. On a near-empty board that is ~2*size rounds, which makes rewards() the single most expensive component of a Go env step:

batch 256, 19x19, CUDA	`rewards()`	`game.step()` for comparison
ply 4	0.56 ms	0.16 ms
ply 30	0.30 ms
ply 120	0.23 ms
ply 240	0.17 ms

Fix

Feed the flood fill a fully-occupied dummy board for non-terminal states (enable=False): it converges in one round, and only actually-terminal boards pay the real fill. Semantics are exact — the discarded scores were never observable, and terminal states use the unchanged algorithm.

rewards() drops to a flat ~0.06 ms across all phases (3–12x faster).

Also stores board_history as int8 (values are in {-1, 0, 1, 2}): 8.7 KB less per 19x19 state, which matters when states are carried as MCTS embeddings (AlphaZero-style tree search holds one state per node).

Validation

300-ply lockstep against the previous implementation (batch 64): board, rewards, terminal flags and observations bit-identical at every ply
two-consecutive-pass terminal scoring identical
force-enabled scoring on 256 dense random boards identical to the old scorer

Alternatives considered

A shared connected-component labeling of the empty points (pointer jumping, one labeling for both colors — related to #1205's torch port and the sotetsuk/go-avoid-while branch) was implemented and benchmarked: it wins on sparse boards (0.15 ms at ply 4) but loses 2-4x on realistic mid-game boards, where empty regions are corridor-shaped (large component diameter) yet every empty point is adjacent to a stone, so the existing fill converges in 2-3 rounds. The fori_loop(2*size-2) approach from sotetsuk/go-avoid-while is also slower (0.39 ms vs 0.23 ms at ply 120) because it always pays the worst case. Masking out non-terminal states beats both without touching the algorithm.

…tory rewards() computes Tromp-Taylor scores on every step, but the result is discarded for non-terminal states. Under vmap/jit the territory flood fill (lax.while_loop) still runs as many rounds as the worst board in the batch needs - on a near-empty board that is ~2*size rounds, making rewards() the single most expensive component of a Go env step (0.56 ms at ply 4 vs 0.16 ms for game.step itself, batch 256, 19x19). Fix: feed the fill a fully-occupied dummy board for non-terminal states (enable=False), which converges in one round. Only actually-terminal boards pay the real fill. Semantics are exact: the discarded scores were never observable, terminal states use the unchanged algorithm. rewards() drops from 0.16-0.71 ms (phase-dependent) to a flat ~0.06 ms, 3-12x faster. Also store board_history as int8 (values are in {-1,0,1,2}): 8.7 KB less per 19x19 env state, relevant when states are carried as MCTS embeddings. Validation: 300-ply lockstep vs the previous implementation (board, rewards, terminal, observation bit-identical), two-pass terminal scoring, and force-enabled scoring on 256 dense boards identical to the old scorer. A pointer-jumping CCL variant (one shared component labeling for both colors) was benchmarked and rejected: it wins on sparse boards but loses 2-4x on realistic mid-game boards, where empty regions are corridor-shaped (large diameter) yet every empty point is adjacent to a stone (old fill converges in 2-3 rounds). (cherry picked from commit d3e4b50)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Go] Skip territory flood fill for non-terminal states (3-12x faster rewards); int8 board history#1321

[Go] Skip territory flood fill for non-terminal states (3-12x faster rewards); int8 board history#1321
gweber wants to merge 1 commit into
sotetsuk:mainfrom
gweber:go-lazy-scoring

gweber commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gweber commented Jun 10, 2026

Problem

Fix

Validation

Alternatives considered

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant