Update: make Qwen3 decode inputs deterministic#106
Update: make Qwen3 decode inputs deterministic#106ndleslx wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
- Add fixed seed support to the Qwen3 decode tensor initializers - Keep init callables self-contained for golden_writer serialization - Rename the scope-1 projection chunk identifiers for readability
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds deterministic seeding to Qwen3 example tensor initializers (module-level Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements deterministic tensor initialization for the Qwen3 32B decode model and its scope-specific files by introducing a SEED constant and applying manual seed offsets. It also renames SCOPE1_K_CHUNK to PROJ_K_CHUNK for improved clarity. Review feedback identifies logic discrepancies in weight scaling for scope 1, inconsistent seed offsets across different scopes compared to the main model, and the need for lazy torch imports within builder functions to follow project performance standards.
| def init_wq(): | ||
| torch.manual_seed(seed + 2) | ||
| return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_wk(): | ||
| torch.manual_seed(seed + 3) | ||
| return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_wv(): | ||
| torch.manual_seed(seed + 4) | ||
| return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5 |
There was a problem hiding this comment.
There is a logic discrepancy in the initialization of wq, wk, and wv between this scope-specific file and the full model implementation in qwen3_32b_decode.py. In the full model, these weights are initialized using torch.rand(...) / scale. Additionally, following the project convention, torch should be imported lazily within these builder functions to avoid import overhead.
def init_wq():
import torch
torch.manual_seed(seed + 2)
return torch.rand(hidden_size, hidden_size) / hidden_size ** 0.5
def init_wk():
import torch
torch.manual_seed(seed + 3)
return torch.rand(hidden_size, kv_hidden) / hidden_size ** 0.5
def init_wv():
import torch
torch.manual_seed(seed + 4)
return torch.rand(hidden_size, kv_hidden) / hidden_size ** 0.5References
- Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.
| def init_seq_lens(): | ||
| if use_max_seq: | ||
| return torch.full((batch,), max_seq, dtype=torch.int32) | ||
| torch.manual_seed(seed) | ||
| return torch.randint(1, max_seq + 1, (batch,), dtype=torch.int32) | ||
|
|
||
| def init_q_proj(): | ||
| torch.manual_seed(seed + 1) | ||
| return torch.rand(batch, hidden) - 0.5 | ||
|
|
||
| def init_k_proj(): | ||
| torch.manual_seed(seed + 2) | ||
| return torch.rand(batch, kv_hidden) - 0.5 | ||
|
|
||
| def init_v_proj(): | ||
| torch.manual_seed(seed + 3) | ||
| return torch.rand(batch, kv_hidden) - 0.5 | ||
|
|
||
| def init_rope_cos(): | ||
| torch.manual_seed(seed + 4) | ||
| return torch.rand(max_seq, head_dim) - 0.5 | ||
|
|
||
| def init_rope_sin(): | ||
| torch.manual_seed(seed + 5) | ||
| return torch.rand(max_seq, head_dim) - 0.5 | ||
|
|
||
| def init_k_cache(): | ||
| torch.manual_seed(seed + 6) | ||
| return torch.rand(cache_rows, head_dim) - 0.5 | ||
|
|
||
| def init_v_cache(): | ||
| torch.manual_seed(seed + 7) | ||
| return torch.rand(cache_rows, head_dim) - 0.5 |
There was a problem hiding this comment.
The seed offsets used here for shared tensors are inconsistent with the offsets used in the full model (qwen3_32b_decode.py). To ensure synchronization, these offsets should be updated. Furthermore, please use lazy imports for torch inside these initialization functions to adhere to the repository's performance guidelines.
def init_seq_lens():
import torch
if use_max_seq:
return torch.full((batch,), max_seq, dtype=torch.int32)
torch.manual_seed(seed + 5)
return torch.randint(1, max_seq + 1, (batch,), dtype=torch.int32)
def init_q_proj():
import torch
torch.manual_seed(seed + 14)
return torch.rand(batch, hidden) - 0.5
def init_k_proj():
import torch
torch.manual_seed(seed + 15)
return torch.rand(batch, kv_hidden) - 0.5
def init_v_proj():
import torch
torch.manual_seed(seed + 16)
return torch.rand(batch, kv_hidden) - 0.5
def init_rope_cos():
import torch
torch.manual_seed(seed + 6)
return torch.rand(max_seq, head_dim) - 0.5
def init_rope_sin():
import torch
torch.manual_seed(seed + 7)
return torch.rand(max_seq, head_dim) - 0.5
def init_k_cache():
import torch
torch.manual_seed(seed + 8)
return torch.rand(cache_rows, head_dim) - 0.5
def init_v_cache():
import torch
torch.manual_seed(seed + 9)
return torch.rand(cache_rows, head_dim) - 0.5References
- Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.
| def init_attn_out(): | ||
| torch.manual_seed(seed) | ||
| return torch.rand(batch, hidden_size) - 0.5 | ||
|
|
||
| def init_hidden_states(): | ||
| torch.manual_seed(seed + 1) | ||
| return torch.rand(batch, hidden_size) - 0.5 | ||
|
|
||
| def init_wo(): | ||
| torch.manual_seed(seed + 2) | ||
| return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_post_rms_weight(): | ||
| return torch.ones(1, hidden_size) | ||
|
|
||
| def init_w_gate(): | ||
| torch.manual_seed(seed + 3) | ||
| return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_w_up(): | ||
| torch.manual_seed(seed + 4) | ||
| return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_w_down(): | ||
| torch.manual_seed(seed + 5) | ||
| return (torch.rand(intermediate_size, hidden_size) - 0.5) / intermediate_size ** 0.5 |
There was a problem hiding this comment.
The seed offsets for shared tensors do not match the offsets in qwen3_32b_decode.py. This prevents the isolated scope test from using the same data as the full model. Additionally, ensure torch is imported lazily within each builder function to follow project conventions.
def init_attn_out():
import torch
torch.manual_seed(seed + 14)
return torch.rand(batch, hidden_size) - 0.5
def init_hidden_states():
import torch
torch.manual_seed(seed)
return torch.rand(batch, hidden_size) - 0.5
def init_wo():
import torch
torch.manual_seed(seed + 10)
return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5
def init_post_rms_weight():
import torch
return torch.ones(1, hidden_size)
def init_w_gate():
import torch
torch.manual_seed(seed + 11)
return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5
def init_w_up():
import torch
torch.manual_seed(seed + 12)
return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5
def init_w_down():
import torch
torch.manual_seed(seed + 13)
return (torch.rand(intermediate_size, hidden_size) - 0.5) / intermediate_size ** 0.5References
- Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/models/qwen3/qwen3_32b_decode.py (1)
446-455: Consider plumbingseedthroughcompile_and_run().
build_tensor_specs()is configurable now, but the main example entrypoint still always goes through the module default. Surfacing the new parameter there would make non-default reproductions easier without rebuilding the specs manually.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/models/qwen3/qwen3_32b_decode.py` around lines 446 - 455, The example entrypoint should accept and pass the seed into build_tensor_specs via compile_and_run so non-default seeds are respected: update the main/example function that calls compile_and_run (and any wrapper that constructs tensor specs) to accept an optional seed parameter and forward it into build_tensor_specs(seed=seed) when creating specs; change the compile_and_run invocation signature or its caller to accept and propagate this seed rather than relying on the module-level SEED default, and ensure any tests or example CLI parsing also expose the seed argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope1.py`:
- Around line 177-187: The init_wq/init_wk/init_wv functions use
(torch.rand(...) - 0.5) / hidden_size**0.5 which yields [-0.5,0.5)/sqrt(hidden)
and thus diverges from the full-decode initialization; change them to
torch.rand(...) / hidden_size**0.5 so they sample [0,1)/sqrt(hidden) with the
same seed offsets (seed+2/3/4) so scope-1 weights match the full decode example.
---
Nitpick comments:
In `@examples/models/qwen3/qwen3_32b_decode.py`:
- Around line 446-455: The example entrypoint should accept and pass the seed
into build_tensor_specs via compile_and_run so non-default seeds are respected:
update the main/example function that calls compile_and_run (and any wrapper
that constructs tensor specs) to accept an optional seed parameter and forward
it into build_tensor_specs(seed=seed) when creating specs; change the
compile_and_run invocation signature or its caller to accept and propagate this
seed rather than relying on the module-level SEED default, and ensure any tests
or example CLI parsing also expose the seed argument.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e13f7b3-1977-4189-b08b-a79aee74fdba
📒 Files selected for processing (4)
examples/models/qwen3/qwen3_32b_decode.pyexamples/models/qwen3/qwen3_32b_decode_scope1.pyexamples/models/qwen3/qwen3_32b_decode_scope2.pyexamples/models/qwen3/qwen3_32b_decode_scope3.py
| def init_wq(): | ||
| torch.manual_seed(seed + 2) | ||
| return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_wk(): | ||
| torch.manual_seed(seed + 3) | ||
| return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5 | ||
|
|
||
| def init_wv(): | ||
| torch.manual_seed(seed + 4) | ||
| return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5 |
There was a problem hiding this comment.
The shared seed still won't reproduce the same scope-1 weights as the full decode example.
init_wq, init_wk, and init_wv sample from [-0.5, 0.5) / sqrt(hidden) here, while examples/models/qwen3/qwen3_32b_decode.py uses [0, 1) / sqrt(hidden) for the same tensors with the same offsets. That means a shared seed now gives different scope-1 parameters across the standalone and integrated examples, which makes seeded cross-checks misleading.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/models/qwen3/qwen3_32b_decode_scope1.py` around lines 177 - 187, The
init_wq/init_wk/init_wv functions use (torch.rand(...) - 0.5) / hidden_size**0.5
which yields [-0.5,0.5)/sqrt(hidden) and thus diverges from the full-decode
initialization; change them to torch.rand(...) / hidden_size**0.5 so they sample
[0,1)/sqrt(hidden) with the same seed offsets (seed+2/3/4) so scope-1 weights
match the full decode example.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/models/qwen3/qwen3_32b_decode_scope2.py (1)
174-189: Remove the commented-out second-pass scaffolding.Softmax is now computed in the same
sbloop as the localraw_scores, but this dead commented block still reads like a half-removed two-pass implementation. Dropping it and updating the stage comment would make the new flow much clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/models/qwen3/qwen3_32b_decode_scope2.py` around lines 174 - 189, Remove the leftover commented-out second-pass scaffolding and update the stage comment to reflect that softmax is now computed inline: delete the commented block that references the second pl.slice of all_raw_scores and the surrounding commented parallel/chunk scaffolding (the lines mentioning pl.at(level=pl.Level.CORE_GROUP...), pl.parallel(ctx_blocks...), and the alternate scores_valid slice using all_raw_scores and sb * Q_HEAD_PAD), and update the nearby comment to indicate that scores_valid (created from raw_scores via pl.slice using Q_HEAD_BATCH, SEQ_TILE, valid_len computed from SEQ_TILE, ctx_len and s0) is handled in the same sb loop rather than a separate pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py`:
- Line 498: The RunConfig invocation contains an incomplete keyword argument
pto_isa_commit= which is invalid syntax; open the RunConfig call (the
RunConfig(...) expression) and either remove the pto_isa_commit parameter
entirely or assign a valid value (e.g., pto_isa_commit=False or
pto_isa_commit=True) so the code parses; ensure the chosen boolean is consistent
with surrounding flags and tests that reference pto_isa_commit in this module.
---
Nitpick comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py`:
- Around line 174-189: Remove the leftover commented-out second-pass scaffolding
and update the stage comment to reflect that softmax is now computed inline:
delete the commented block that references the second pl.slice of all_raw_scores
and the surrounding commented parallel/chunk scaffolding (the lines mentioning
pl.at(level=pl.Level.CORE_GROUP...), pl.parallel(ctx_blocks...), and the
alternate scores_valid slice using all_raw_scores and sb * Q_HEAD_PAD), and
update the nearby comment to indicate that scores_valid (created from raw_scores
via pl.slice using Q_HEAD_BATCH, SEQ_TILE, valid_len computed from SEQ_TILE,
ctx_len and s0) is handled in the same sb loop rather than a separate pass.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0f4731d3-7bf7-4c2d-9fdb-816515ba8aff
📒 Files selected for processing (1)
examples/models/qwen3/qwen3_32b_decode_scope2.py
| dump_passes=dump_passes, | ||
| backend_type=backend, | ||
| runtime_profiling=runtime_profiling, | ||
| pto_isa_commit= |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="$(fd -p 'qwen3_32b_decode_scope2.py' examples | head -n1)"
python - <<'PY' "$file"
import ast
import pathlib
import sys
path = pathlib.Path(sys.argv[1])
src = path.read_text()
try:
ast.parse(src, filename=str(path))
print("parse ok")
except SyntaxError as exc:
print(f"{exc.filename}:{exc.lineno}:{exc.offset}: {exc.msg}")
if exc.text:
print(exc.text.rstrip())
raise
PYRepository: hw-native-sys/pypto-lib
Length of output: 502
🏁 Script executed:
#!/bin/bash
fd -p 'qwen3_32b_decode_scope2.py' examples | head -n1 | xargs sed -n '490,505p'Repository: hw-native-sys/pypto-lib
Length of output: 459
Fix the incomplete pto_isa_commit keyword argument.
RunConfig(..., pto_isa_commit=) on line 498 is invalid Python syntax. The module cannot parse. Provide a value for this argument or remove it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py` at line 498, The RunConfig
invocation contains an incomplete keyword argument pto_isa_commit= which is
invalid syntax; open the RunConfig call (the RunConfig(...) expression) and
either remove the pto_isa_commit parameter entirely or assign a valid value
(e.g., pto_isa_commit=False or pto_isa_commit=True) so the code parses; ensure
the chosen boolean is consistent with surrounding flags and tests that reference
pto_isa_commit in this module.
Summary
Related Issues