Skip to content

[Go] Skip territory flood fill for non-terminal states (3-12x faster rewards); int8 board history#1321

Open
gweber wants to merge 1 commit into
sotetsuk:mainfrom
gweber:go-lazy-scoring
Open

[Go] Skip territory flood fill for non-terminal states (3-12x faster rewards); int8 board history#1321
gweber wants to merge 1 commit into
sotetsuk:mainfrom
gweber:go-lazy-scoring

Conversation

@gweber

@gweber gweber commented Jun 10, 2026

Copy link
Copy Markdown

Problem

Game.rewards() computes Tromp-Taylor scores on every step, but the result is discarded for non-terminal states (lax.select(self.is_terminal(state), rewards, zeros)). Under vmap/jit the discarded computation is not free: the territory flood fill (lax.while_loop in _count_ji) runs as many rounds as the worst board in the batch needs. On a near-empty board that is ~2*size rounds, which makes rewards() the single most expensive component of a Go env step:

batch 256, 19x19, CUDA rewards() game.step() for comparison
ply 4 0.56 ms 0.16 ms
ply 30 0.30 ms
ply 120 0.23 ms
ply 240 0.17 ms

Fix

Feed the flood fill a fully-occupied dummy board for non-terminal states (enable=False): it converges in one round, and only actually-terminal boards pay the real fill. Semantics are exact — the discarded scores were never observable, and terminal states use the unchanged algorithm.

rewards() drops to a flat ~0.06 ms across all phases (3–12x faster).

Also stores board_history as int8 (values are in {-1, 0, 1, 2}): 8.7 KB less per 19x19 state, which matters when states are carried as MCTS embeddings (AlphaZero-style tree search holds one state per node).

Validation

  • 300-ply lockstep against the previous implementation (batch 64): board, rewards, terminal flags and observations bit-identical at every ply
  • two-consecutive-pass terminal scoring identical
  • force-enabled scoring on 256 dense random boards identical to the old scorer

Alternatives considered

A shared connected-component labeling of the empty points (pointer jumping, one labeling for both colors — related to #1205's torch port and the sotetsuk/go-avoid-while branch) was implemented and benchmarked: it wins on sparse boards (0.15 ms at ply 4) but loses 2-4x on realistic mid-game boards, where empty regions are corridor-shaped (large component diameter) yet every empty point is adjacent to a stone, so the existing fill converges in 2-3 rounds. The fori_loop(2*size-2) approach from sotetsuk/go-avoid-while is also slower (0.39 ms vs 0.23 ms at ply 120) because it always pays the worst case. Masking out non-terminal states beats both without touching the algorithm.

…tory

rewards() computes Tromp-Taylor scores on every step, but the result is
discarded for non-terminal states. Under vmap/jit the territory flood fill
(lax.while_loop) still runs as many rounds as the worst board in the batch
needs - on a near-empty board that is ~2*size rounds, making rewards() the
single most expensive component of a Go env step (0.56 ms at ply 4 vs
0.16 ms for game.step itself, batch 256, 19x19).

Fix: feed the fill a fully-occupied dummy board for non-terminal states
(enable=False), which converges in one round. Only actually-terminal
boards pay the real fill. Semantics are exact: the discarded scores were
never observable, terminal states use the unchanged algorithm.

rewards() drops from 0.16-0.71 ms (phase-dependent) to a flat ~0.06 ms,
3-12x faster.

Also store board_history as int8 (values are in {-1,0,1,2}): 8.7 KB less
per 19x19 env state, relevant when states are carried as MCTS embeddings.

Validation: 300-ply lockstep vs the previous implementation (board,
rewards, terminal, observation bit-identical), two-pass terminal scoring,
and force-enabled scoring on 256 dense boards identical to the old scorer.

A pointer-jumping CCL variant (one shared component labeling for both
colors) was benchmarked and rejected: it wins on sparse boards but loses
2-4x on realistic mid-game boards, where empty regions are corridor-shaped
(large diameter) yet every empty point is adjacent to a stone (old fill
converges in 2-3 rounds).

(cherry picked from commit d3e4b50)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant