Update: make Qwen3 decode inputs deterministic by ndleslx · Pull Request #106 · hw-native-sys/pypto-lib

ndleslx · 2026-04-14T03:09:04Z

Summary

add fixed seed support to the Qwen3 decode tensor initializers
keep init callables self-contained so golden_writer can serialize generated inputs
rename the full decode scope-1 projection chunk identifiers for readability

Related Issues

None.

- Add fixed seed support to the Qwen3 decode tensor initializers - Keep init callables self-contained for golden_writer serialization - Rename the scope-1 projection chunk identifiers for readability

coderabbitai · 2026-04-14T03:09:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds deterministic seeding to Qwen3 example tensor initializers (module-level SEED = 0 and seed parameter on build_tensor_specs) across scope1/2/3 and renames SCOPE1_K_CHUNK → PROJ_K_CHUNK with corresponding updates in the main decode file and golden reference.

Changes

Cohort / File(s)	Summary
Main decode refactoring & seeding `examples/models/qwen3/qwen3_32b_decode.py`	Renamed `SCOPE1_K_CHUNK` → `PROJ_K_CHUNK`, updated derived names (`scope1_hidden_blocks` → `proj_hidden_blocks`) and projection/RMSNorm chunking/slices; added `SEED` and `seed` arg to `build_tensor_specs()` and made tensor generation deterministic; updated golden reference to use new constant.
Scope 1 seeding `examples/models/qwen3/qwen3_32b_decode_scope1.py`	Added `SEED = 0` and `seed` parameter on `build_tensor_specs()`; applied `torch.manual_seed(seed + N)` before generating hidden states and RMS/projection tensors.
Scope 2 seeding & minor control tweak `examples/models/qwen3/qwen3_32b_decode_scope2.py`	Added `SEED = 0` and `seed` parameter; deterministic tensor initialization via `torch.manual_seed(seed + N)` for seq_lens, projection weights, RoPE, KV caches; removed/disabled assembly of `all_raw_scores` and adjusted softmax-region slicing; added `pto_isa_commit=` in `RunConfig`.
Scope 3 seeding `examples/models/qwen3/qwen3_32b_decode_scope3.py`	Added `SEED = 0` and `seed` parameter; applied per-tensor `torch.manual_seed(seed + N)` for attention output, hidden states, and various MLP/weight tensors; no other control-flow changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Update: chunk Qwen3 decode scope1 projections #104: Related edits to scope‑1 projection tiling/chunking and slicing—overlaps with PROJ_K_CHUNK rename and chunk logic changes.
Refactor: Qwen3 decode with 3-scope architecture and TILELET rename #99: Modifies qwen3_32b_decode.py constants and tensor initialization; strong code-level overlap with this PR.
Refactor qwen3 scope3 decode flow and runtime configuration #87: Changes build_tensor_specs() in scope3 examples around tensor initializers—closely related to added seed handling.

Poem

🐰 I seeded the fields from zero to one,
Tamed tensors until randomness' done,
Chunks renamed, slices aligned,
Reproducible hops through every design,
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: making Qwen3 decode inputs deterministic by adding seed support across all decoder files.
Description check	✅ Passed	The description is directly related to the changeset, covering the three main objectives: adding fixed seed support, keeping initializers self-contained, and renaming scope-1 projection identifiers.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements deterministic tensor initialization for the Qwen3 32B decode model and its scope-specific files by introducing a SEED constant and applying manual seed offsets. It also renames SCOPE1_K_CHUNK to PROJ_K_CHUNK for improved clarity. Review feedback identifies logic discrepancies in weight scaling for scope 1, inconsistent seed offsets across different scopes compared to the main model, and the need for lazy torch imports within builder functions to follow project performance standards.

gemini-code-assist · 2026-04-14T03:11:22Z

examples/models/qwen3/qwen3_32b_decode_scope1.py

    def init_wq():
+        torch.manual_seed(seed + 2)
        return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

    def init_wk():
+        torch.manual_seed(seed + 3)
        return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5

    def init_wv():
+        torch.manual_seed(seed + 4)
        return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5


There is a logic discrepancy in the initialization of wq, wk, and wv between this scope-specific file and the full model implementation in qwen3_32b_decode.py. In the full model, these weights are initialized using torch.rand(...) / scale. Additionally, following the project convention, torch should be imported lazily within these builder functions to avoid import overhead.

def init_wq(): import torch torch.manual_seed(seed + 2) return torch.rand(hidden_size, hidden_size) / hidden_size ** 0.5 def init_wk(): import torch torch.manual_seed(seed + 3) return torch.rand(hidden_size, kv_hidden) / hidden_size ** 0.5 def init_wv(): import torch torch.manual_seed(seed + 4) return torch.rand(hidden_size, kv_hidden) / hidden_size ** 0.5

References

Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.

gemini-code-assist · 2026-04-14T03:11:22Z

examples/models/qwen3/qwen3_32b_decode_scope2.py

    def init_seq_lens():
        if use_max_seq:
            return torch.full((batch,), max_seq, dtype=torch.int32)
+        torch.manual_seed(seed)
        return torch.randint(1, max_seq + 1, (batch,), dtype=torch.int32)

    def init_q_proj():
+        torch.manual_seed(seed + 1)
        return torch.rand(batch, hidden) - 0.5

    def init_k_proj():
+        torch.manual_seed(seed + 2)
        return torch.rand(batch, kv_hidden) - 0.5

    def init_v_proj():
+        torch.manual_seed(seed + 3)
        return torch.rand(batch, kv_hidden) - 0.5

    def init_rope_cos():
+        torch.manual_seed(seed + 4)
        return torch.rand(max_seq, head_dim) - 0.5

    def init_rope_sin():
+        torch.manual_seed(seed + 5)
        return torch.rand(max_seq, head_dim) - 0.5

    def init_k_cache():
+        torch.manual_seed(seed + 6)
        return torch.rand(cache_rows, head_dim) - 0.5

    def init_v_cache():
+        torch.manual_seed(seed + 7)
        return torch.rand(cache_rows, head_dim) - 0.5


The seed offsets used here for shared tensors are inconsistent with the offsets used in the full model (qwen3_32b_decode.py). To ensure synchronization, these offsets should be updated. Furthermore, please use lazy imports for torch inside these initialization functions to adhere to the repository's performance guidelines.

def init_seq_lens(): import torch if use_max_seq: return torch.full((batch,), max_seq, dtype=torch.int32) torch.manual_seed(seed + 5) return torch.randint(1, max_seq + 1, (batch,), dtype=torch.int32) def init_q_proj(): import torch torch.manual_seed(seed + 14) return torch.rand(batch, hidden) - 0.5 def init_k_proj(): import torch torch.manual_seed(seed + 15) return torch.rand(batch, kv_hidden) - 0.5 def init_v_proj(): import torch torch.manual_seed(seed + 16) return torch.rand(batch, kv_hidden) - 0.5 def init_rope_cos(): import torch torch.manual_seed(seed + 6) return torch.rand(max_seq, head_dim) - 0.5 def init_rope_sin(): import torch torch.manual_seed(seed + 7) return torch.rand(max_seq, head_dim) - 0.5 def init_k_cache(): import torch torch.manual_seed(seed + 8) return torch.rand(cache_rows, head_dim) - 0.5 def init_v_cache(): import torch torch.manual_seed(seed + 9) return torch.rand(cache_rows, head_dim) - 0.5

References

Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.

gemini-code-assist · 2026-04-14T03:11:22Z

examples/models/qwen3/qwen3_32b_decode_scope3.py

    def init_attn_out():
+        torch.manual_seed(seed)
        return torch.rand(batch, hidden_size) - 0.5

    def init_hidden_states():
+        torch.manual_seed(seed + 1)
        return torch.rand(batch, hidden_size) - 0.5

    def init_wo():
+        torch.manual_seed(seed + 2)
        return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

    def init_post_rms_weight():
        return torch.ones(1, hidden_size)

    def init_w_gate():
+        torch.manual_seed(seed + 3)
        return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5

    def init_w_up():
+        torch.manual_seed(seed + 4)
        return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5

    def init_w_down():
+        torch.manual_seed(seed + 5)
        return (torch.rand(intermediate_size, hidden_size) - 0.5) / intermediate_size ** 0.5


The seed offsets for shared tensors do not match the offsets in qwen3_32b_decode.py. This prevents the isolated scope test from using the same data as the full model. Additionally, ensure torch is imported lazily within each builder function to follow project conventions.

def init_attn_out(): import torch torch.manual_seed(seed + 14) return torch.rand(batch, hidden_size) - 0.5 def init_hidden_states(): import torch torch.manual_seed(seed) return torch.rand(batch, hidden_size) - 0.5 def init_wo(): import torch torch.manual_seed(seed + 10) return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5 def init_post_rms_weight(): import torch return torch.ones(1, hidden_size) def init_w_gate(): import torch torch.manual_seed(seed + 11) return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5 def init_w_up(): import torch torch.manual_seed(seed + 12) return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5 def init_w_down(): import torch torch.manual_seed(seed + 13) return (torch.rand(intermediate_size, hidden_size) - 0.5) / intermediate_size ** 0.5

References

Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/models/qwen3/qwen3_32b_decode.py (1)
446-455: Consider plumbing seed through compile_and_run().

build_tensor_specs() is configurable now, but the main example entrypoint still always goes through the module default. Surfacing the new parameter there would make non-default reproductions easier without rebuilding the specs manually.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/qwen3/qwen3_32b_decode.py` around lines 446 - 455, The
example entrypoint should accept and pass the seed into build_tensor_specs via
compile_and_run so non-default seeds are respected: update the main/example
function that calls compile_and_run (and any wrapper that constructs tensor
specs) to accept an optional seed parameter and forward it into
build_tensor_specs(seed=seed) when creating specs; change the compile_and_run
invocation signature or its caller to accept and propagate this seed rather than
relying on the module-level SEED default, and ensure any tests or example CLI
parsing also expose the seed argument.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope1.py`:
- Around line 177-187: The init_wq/init_wk/init_wv functions use
(torch.rand(...) - 0.5) / hidden_size**0.5 which yields [-0.5,0.5)/sqrt(hidden)
and thus diverges from the full-decode initialization; change them to
torch.rand(...) / hidden_size**0.5 so they sample [0,1)/sqrt(hidden) with the
same seed offsets (seed+2/3/4) so scope-1 weights match the full decode example.

---

Nitpick comments:
In `@examples/models/qwen3/qwen3_32b_decode.py`:
- Around line 446-455: The example entrypoint should accept and pass the seed
into build_tensor_specs via compile_and_run so non-default seeds are respected:
update the main/example function that calls compile_and_run (and any wrapper
that constructs tensor specs) to accept an optional seed parameter and forward
it into build_tensor_specs(seed=seed) when creating specs; change the
compile_and_run invocation signature or its caller to accept and propagate this
seed rather than relying on the module-level SEED default, and ensure any tests
or example CLI parsing also expose the seed argument.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e13f7b3-1977-4189-b08b-a79aee74fdba

📥 Commits

Reviewing files that changed from the base of the PR and between efce0d4 and aebc330.

📒 Files selected for processing (4)

examples/models/qwen3/qwen3_32b_decode.py
examples/models/qwen3/qwen3_32b_decode_scope1.py
examples/models/qwen3/qwen3_32b_decode_scope2.py
examples/models/qwen3/qwen3_32b_decode_scope3.py

coderabbitai · 2026-04-14T03:16:07Z

examples/models/qwen3/qwen3_32b_decode_scope1.py

    def init_wq():
+        torch.manual_seed(seed + 2)
        return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

    def init_wk():
+        torch.manual_seed(seed + 3)
        return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5

    def init_wv():
+        torch.manual_seed(seed + 4)
        return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5


⚠️ Potential issue | 🟡 Minor

The shared seed still won't reproduce the same scope-1 weights as the full decode example.

init_wq, init_wk, and init_wv sample from [-0.5, 0.5) / sqrt(hidden) here, while examples/models/qwen3/qwen3_32b_decode.py uses [0, 1) / sqrt(hidden) for the same tensors with the same offsets. That means a shared seed now gives different scope-1 parameters across the standalone and integrated examples, which makes seeded cross-checks misleading.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/models/qwen3/qwen3_32b_decode_scope1.py` around lines 177 - 187, The init_wq/init_wk/init_wv functions use (torch.rand(...) - 0.5) / hidden_size**0.5 which yields [-0.5,0.5)/sqrt(hidden) and thus diverges from the full-decode initialization; change them to torch.rand(...) / hidden_size**0.5 so they sample [0,1)/sqrt(hidden) with the same seed offsets (seed+2/3/4) so scope-1 weights match the full decode example.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/models/qwen3/qwen3_32b_decode_scope2.py (1)
174-189: Remove the commented-out second-pass scaffolding.

Softmax is now computed in the same sb loop as the local raw_scores, but this dead commented block still reads like a half-removed two-pass implementation. Dropping it and updating the stage comment would make the new flow much clearer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/qwen3/qwen3_32b_decode_scope2.py` around lines 174 - 189,
Remove the leftover commented-out second-pass scaffolding and update the stage
comment to reflect that softmax is now computed inline: delete the commented
block that references the second pl.slice of all_raw_scores and the surrounding
commented parallel/chunk scaffolding (the lines mentioning
pl.at(level=pl.Level.CORE_GROUP...), pl.parallel(ctx_blocks...), and the
alternate scores_valid slice using all_raw_scores and sb * Q_HEAD_PAD), and
update the nearby comment to indicate that scores_valid (created from raw_scores
via pl.slice using Q_HEAD_BATCH, SEQ_TILE, valid_len computed from SEQ_TILE,
ctx_len and s0) is handled in the same sb loop rather than a separate pass.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py`:
- Line 498: The RunConfig invocation contains an incomplete keyword argument
pto_isa_commit= which is invalid syntax; open the RunConfig call (the
RunConfig(...) expression) and either remove the pto_isa_commit parameter
entirely or assign a valid value (e.g., pto_isa_commit=False or
pto_isa_commit=True) so the code parses; ensure the chosen boolean is consistent
with surrounding flags and tests that reference pto_isa_commit in this module.

---

Nitpick comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py`:
- Around line 174-189: Remove the leftover commented-out second-pass scaffolding
and update the stage comment to reflect that softmax is now computed inline:
delete the commented block that references the second pl.slice of all_raw_scores
and the surrounding commented parallel/chunk scaffolding (the lines mentioning
pl.at(level=pl.Level.CORE_GROUP...), pl.parallel(ctx_blocks...), and the
alternate scores_valid slice using all_raw_scores and sb * Q_HEAD_PAD), and
update the nearby comment to indicate that scores_valid (created from raw_scores
via pl.slice using Q_HEAD_BATCH, SEQ_TILE, valid_len computed from SEQ_TILE,
ctx_len and s0) is handled in the same sb loop rather than a separate pass.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f4731d3-7bf7-4c2d-9fdb-816515ba8aff

📥 Commits

Reviewing files that changed from the base of the PR and between aebc330 and 5f3abd5.

📒 Files selected for processing (1)

examples/models/qwen3/qwen3_32b_decode_scope2.py

coderabbitai · 2026-04-14T11:09:29Z

examples/models/qwen3/qwen3_32b_decode_scope2.py

            dump_passes=dump_passes,
            backend_type=backend,
            runtime_profiling=runtime_profiling,
+            pto_isa_commit=


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail file="$(fd -p 'qwen3_32b_decode_scope2.py' examples | head -n1)" python - <<'PY' "$file" import ast import pathlib import sys path = pathlib.Path(sys.argv[1]) src = path.read_text() try: ast.parse(src, filename=str(path)) print("parse ok") except SyntaxError as exc: print(f"{exc.filename}:{exc.lineno}:{exc.offset}: {exc.msg}") if exc.text: print(exc.text.rstrip()) raise PY

Repository: hw-native-sys/pypto-lib

Length of output: 502

🏁 Script executed:

#!/bin/bash fd -p 'qwen3_32b_decode_scope2.py' examples | head -n1 | xargs sed -n '490,505p'

Repository: hw-native-sys/pypto-lib

Length of output: 459

Fix the incomplete pto_isa_commit keyword argument.

RunConfig(..., pto_isa_commit=) on line 498 is invalid Python syntax. The module cannot parse. Provide a value for this argument or remove it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/models/qwen3/qwen3_32b_decode_scope2.py` at line 498, The RunConfig invocation contains an incomplete keyword argument pto_isa_commit= which is invalid syntax; open the RunConfig call (the RunConfig(...) expression) and either remove the pto_isa_commit parameter entirely or assign a valid value (e.g., pto_isa_commit=False or pto_isa_commit=True) so the code parses; ensure the chosen boolean is consistent with surrounding flags and tests that reference pto_isa_commit in this module.

Update: make Qwen3 decode inputs deterministic

aebc330

- Add fixed seed support to the Qwen3 decode tensor initializers - Keep init callables self-contained for golden_writer serialization - Rename the scope-1 projection chunk identifiers for readability

gemini-code-assist bot reviewed Apr 14, 2026

View reviewed changes

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

ndleslx force-pushed the opt-scope2 branch from 5f3abd5 to aebc330 Compare April 14, 2026 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update: make Qwen3 decode inputs deterministic#106

Update: make Qwen3 decode inputs deterministic#106
ndleslx wants to merge 1 commit intohw-native-sys:mainfrom
ndleslx:opt-scope2

ndleslx commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndleslx commented Apr 14, 2026

Summary

Related Issues

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 14, 2026 •

edited

Loading