Skip to content

Update: make Qwen3 decode inputs deterministic#106

Open
ndleslx wants to merge 1 commit intohw-native-sys:mainfrom
ndleslx:opt-scope2
Open

Update: make Qwen3 decode inputs deterministic#106
ndleslx wants to merge 1 commit intohw-native-sys:mainfrom
ndleslx:opt-scope2

Conversation

@ndleslx
Copy link
Copy Markdown
Contributor

@ndleslx ndleslx commented Apr 14, 2026

Summary

  • add fixed seed support to the Qwen3 decode tensor initializers
  • keep init callables self-contained so golden_writer can serialize generated inputs
  • rename the full decode scope-1 projection chunk identifiers for readability

Related Issues

  • None.

- Add fixed seed support to the Qwen3 decode tensor initializers
- Keep init callables self-contained for golden_writer serialization
- Rename the scope-1 projection chunk identifiers for readability
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds deterministic seeding to Qwen3 example tensor initializers (module-level SEED = 0 and seed parameter on build_tensor_specs) across scope1/2/3 and renames SCOPE1_K_CHUNKPROJ_K_CHUNK with corresponding updates in the main decode file and golden reference.

Changes

Cohort / File(s) Summary
Main decode refactoring & seeding
examples/models/qwen3/qwen3_32b_decode.py
Renamed SCOPE1_K_CHUNKPROJ_K_CHUNK, updated derived names (scope1_hidden_blocksproj_hidden_blocks) and projection/RMSNorm chunking/slices; added SEED and seed arg to build_tensor_specs() and made tensor generation deterministic; updated golden reference to use new constant.
Scope 1 seeding
examples/models/qwen3/qwen3_32b_decode_scope1.py
Added SEED = 0 and seed parameter on build_tensor_specs(); applied torch.manual_seed(seed + N) before generating hidden states and RMS/projection tensors.
Scope 2 seeding & minor control tweak
examples/models/qwen3/qwen3_32b_decode_scope2.py
Added SEED = 0 and seed parameter; deterministic tensor initialization via torch.manual_seed(seed + N) for seq_lens, projection weights, RoPE, KV caches; removed/disabled assembly of all_raw_scores and adjusted softmax-region slicing; added pto_isa_commit= in RunConfig.
Scope 3 seeding
examples/models/qwen3/qwen3_32b_decode_scope3.py
Added SEED = 0 and seed parameter; applied per-tensor torch.manual_seed(seed + N) for attention output, hidden states, and various MLP/weight tensors; no other control-flow changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I seeded the fields from zero to one,
Tamed tensors until randomness' done,
Chunks renamed, slices aligned,
Reproducible hops through every design,
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: making Qwen3 decode inputs deterministic by adding seed support across all decoder files.
Description check ✅ Passed The description is directly related to the changeset, covering the three main objectives: adding fixed seed support, keeping initializers self-contained, and renaming scope-1 projection identifiers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements deterministic tensor initialization for the Qwen3 32B decode model and its scope-specific files by introducing a SEED constant and applying manual seed offsets. It also renames SCOPE1_K_CHUNK to PROJ_K_CHUNK for improved clarity. Review feedback identifies logic discrepancies in weight scaling for scope 1, inconsistent seed offsets across different scopes compared to the main model, and the need for lazy torch imports within builder functions to follow project performance standards.

Comment on lines 177 to 187
def init_wq():
torch.manual_seed(seed + 2)
return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

def init_wk():
torch.manual_seed(seed + 3)
return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5

def init_wv():
torch.manual_seed(seed + 4)
return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a logic discrepancy in the initialization of wq, wk, and wv between this scope-specific file and the full model implementation in qwen3_32b_decode.py. In the full model, these weights are initialized using torch.rand(...) / scale. Additionally, following the project convention, torch should be imported lazily within these builder functions to avoid import overhead.

    def init_wq():
        import torch
        torch.manual_seed(seed + 2)
        return torch.rand(hidden_size, hidden_size) / hidden_size ** 0.5

    def init_wk():
        import torch
        torch.manual_seed(seed + 3)
        return torch.rand(hidden_size, kv_hidden) / hidden_size ** 0.5

    def init_wv():
        import torch
        torch.manual_seed(seed + 4)
        return torch.rand(hidden_size, kv_hidden) / hidden_size ** 0.5
References
  1. Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.

Comment on lines 259 to 291
def init_seq_lens():
if use_max_seq:
return torch.full((batch,), max_seq, dtype=torch.int32)
torch.manual_seed(seed)
return torch.randint(1, max_seq + 1, (batch,), dtype=torch.int32)

def init_q_proj():
torch.manual_seed(seed + 1)
return torch.rand(batch, hidden) - 0.5

def init_k_proj():
torch.manual_seed(seed + 2)
return torch.rand(batch, kv_hidden) - 0.5

def init_v_proj():
torch.manual_seed(seed + 3)
return torch.rand(batch, kv_hidden) - 0.5

def init_rope_cos():
torch.manual_seed(seed + 4)
return torch.rand(max_seq, head_dim) - 0.5

def init_rope_sin():
torch.manual_seed(seed + 5)
return torch.rand(max_seq, head_dim) - 0.5

def init_k_cache():
torch.manual_seed(seed + 6)
return torch.rand(cache_rows, head_dim) - 0.5

def init_v_cache():
torch.manual_seed(seed + 7)
return torch.rand(cache_rows, head_dim) - 0.5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The seed offsets used here for shared tensors are inconsistent with the offsets used in the full model (qwen3_32b_decode.py). To ensure synchronization, these offsets should be updated. Furthermore, please use lazy imports for torch inside these initialization functions to adhere to the repository's performance guidelines.

    def init_seq_lens():
        import torch
        if use_max_seq:
            return torch.full((batch,), max_seq, dtype=torch.int32)
        torch.manual_seed(seed + 5)
        return torch.randint(1, max_seq + 1, (batch,), dtype=torch.int32)

    def init_q_proj():
        import torch
        torch.manual_seed(seed + 14)
        return torch.rand(batch, hidden) - 0.5

    def init_k_proj():
        import torch
        torch.manual_seed(seed + 15)
        return torch.rand(batch, kv_hidden) - 0.5

    def init_v_proj():
        import torch
        torch.manual_seed(seed + 16)
        return torch.rand(batch, kv_hidden) - 0.5

    def init_rope_cos():
        import torch
        torch.manual_seed(seed + 6)
        return torch.rand(max_seq, head_dim) - 0.5

    def init_rope_sin():
        import torch
        torch.manual_seed(seed + 7)
        return torch.rand(max_seq, head_dim) - 0.5

    def init_k_cache():
        import torch
        torch.manual_seed(seed + 8)
        return torch.rand(cache_rows, head_dim) - 0.5

    def init_v_cache():
        import torch
        torch.manual_seed(seed + 9)
        return torch.rand(cache_rows, head_dim) - 0.5
References
  1. Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.

Comment on lines 215 to 240
def init_attn_out():
torch.manual_seed(seed)
return torch.rand(batch, hidden_size) - 0.5

def init_hidden_states():
torch.manual_seed(seed + 1)
return torch.rand(batch, hidden_size) - 0.5

def init_wo():
torch.manual_seed(seed + 2)
return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

def init_post_rms_weight():
return torch.ones(1, hidden_size)

def init_w_gate():
torch.manual_seed(seed + 3)
return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5

def init_w_up():
torch.manual_seed(seed + 4)
return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5

def init_w_down():
torch.manual_seed(seed + 5)
return (torch.rand(intermediate_size, hidden_size) - 0.5) / intermediate_size ** 0.5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The seed offsets for shared tensors do not match the offsets in qwen3_32b_decode.py. This prevents the isolated scope test from using the same data as the full model. Additionally, ensure torch is imported lazily within each builder function to follow project conventions.

    def init_attn_out():
        import torch
        torch.manual_seed(seed + 14)
        return torch.rand(batch, hidden_size) - 0.5

    def init_hidden_states():
        import torch
        torch.manual_seed(seed)
        return torch.rand(batch, hidden_size) - 0.5

    def init_wo():
        import torch
        torch.manual_seed(seed + 10)
        return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

    def init_post_rms_weight():
        import torch
        return torch.ones(1, hidden_size)

    def init_w_gate():
        import torch
        torch.manual_seed(seed + 11)
        return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5

    def init_w_up():
        import torch
        torch.manual_seed(seed + 12)
        return (torch.rand(hidden_size, intermediate_size) - 0.5) / hidden_size ** 0.5

    def init_w_down():
        import torch
        torch.manual_seed(seed + 13)
        return (torch.rand(intermediate_size, hidden_size) - 0.5) / intermediate_size ** 0.5
References
  1. Lazy imports for 'torch' and 'pypto.runtime' are a project convention to avoid import overhead when only builder functions are used.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/models/qwen3/qwen3_32b_decode.py (1)

446-455: Consider plumbing seed through compile_and_run().

build_tensor_specs() is configurable now, but the main example entrypoint still always goes through the module default. Surfacing the new parameter there would make non-default reproductions easier without rebuilding the specs manually.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/qwen3/qwen3_32b_decode.py` around lines 446 - 455, The
example entrypoint should accept and pass the seed into build_tensor_specs via
compile_and_run so non-default seeds are respected: update the main/example
function that calls compile_and_run (and any wrapper that constructs tensor
specs) to accept an optional seed parameter and forward it into
build_tensor_specs(seed=seed) when creating specs; change the compile_and_run
invocation signature or its caller to accept and propagate this seed rather than
relying on the module-level SEED default, and ensure any tests or example CLI
parsing also expose the seed argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope1.py`:
- Around line 177-187: The init_wq/init_wk/init_wv functions use
(torch.rand(...) - 0.5) / hidden_size**0.5 which yields [-0.5,0.5)/sqrt(hidden)
and thus diverges from the full-decode initialization; change them to
torch.rand(...) / hidden_size**0.5 so they sample [0,1)/sqrt(hidden) with the
same seed offsets (seed+2/3/4) so scope-1 weights match the full decode example.

---

Nitpick comments:
In `@examples/models/qwen3/qwen3_32b_decode.py`:
- Around line 446-455: The example entrypoint should accept and pass the seed
into build_tensor_specs via compile_and_run so non-default seeds are respected:
update the main/example function that calls compile_and_run (and any wrapper
that constructs tensor specs) to accept an optional seed parameter and forward
it into build_tensor_specs(seed=seed) when creating specs; change the
compile_and_run invocation signature or its caller to accept and propagate this
seed rather than relying on the module-level SEED default, and ensure any tests
or example CLI parsing also expose the seed argument.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e13f7b3-1977-4189-b08b-a79aee74fdba

📥 Commits

Reviewing files that changed from the base of the PR and between efce0d4 and aebc330.

📒 Files selected for processing (4)
  • examples/models/qwen3/qwen3_32b_decode.py
  • examples/models/qwen3/qwen3_32b_decode_scope1.py
  • examples/models/qwen3/qwen3_32b_decode_scope2.py
  • examples/models/qwen3/qwen3_32b_decode_scope3.py

Comment on lines 177 to 187
def init_wq():
torch.manual_seed(seed + 2)
return (torch.rand(hidden_size, hidden_size) - 0.5) / hidden_size ** 0.5

def init_wk():
torch.manual_seed(seed + 3)
return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5

def init_wv():
torch.manual_seed(seed + 4)
return (torch.rand(hidden_size, kv_hidden) - 0.5) / hidden_size ** 0.5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The shared seed still won't reproduce the same scope-1 weights as the full decode example.

init_wq, init_wk, and init_wv sample from [-0.5, 0.5) / sqrt(hidden) here, while examples/models/qwen3/qwen3_32b_decode.py uses [0, 1) / sqrt(hidden) for the same tensors with the same offsets. That means a shared seed now gives different scope-1 parameters across the standalone and integrated examples, which makes seeded cross-checks misleading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/qwen3/qwen3_32b_decode_scope1.py` around lines 177 - 187, The
init_wq/init_wk/init_wv functions use (torch.rand(...) - 0.5) / hidden_size**0.5
which yields [-0.5,0.5)/sqrt(hidden) and thus diverges from the full-decode
initialization; change them to torch.rand(...) / hidden_size**0.5 so they sample
[0,1)/sqrt(hidden) with the same seed offsets (seed+2/3/4) so scope-1 weights
match the full decode example.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/models/qwen3/qwen3_32b_decode_scope2.py (1)

174-189: Remove the commented-out second-pass scaffolding.

Softmax is now computed in the same sb loop as the local raw_scores, but this dead commented block still reads like a half-removed two-pass implementation. Dropping it and updating the stage comment would make the new flow much clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/qwen3/qwen3_32b_decode_scope2.py` around lines 174 - 189,
Remove the leftover commented-out second-pass scaffolding and update the stage
comment to reflect that softmax is now computed inline: delete the commented
block that references the second pl.slice of all_raw_scores and the surrounding
commented parallel/chunk scaffolding (the lines mentioning
pl.at(level=pl.Level.CORE_GROUP...), pl.parallel(ctx_blocks...), and the
alternate scores_valid slice using all_raw_scores and sb * Q_HEAD_PAD), and
update the nearby comment to indicate that scores_valid (created from raw_scores
via pl.slice using Q_HEAD_BATCH, SEQ_TILE, valid_len computed from SEQ_TILE,
ctx_len and s0) is handled in the same sb loop rather than a separate pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py`:
- Line 498: The RunConfig invocation contains an incomplete keyword argument
pto_isa_commit= which is invalid syntax; open the RunConfig call (the
RunConfig(...) expression) and either remove the pto_isa_commit parameter
entirely or assign a valid value (e.g., pto_isa_commit=False or
pto_isa_commit=True) so the code parses; ensure the chosen boolean is consistent
with surrounding flags and tests that reference pto_isa_commit in this module.

---

Nitpick comments:
In `@examples/models/qwen3/qwen3_32b_decode_scope2.py`:
- Around line 174-189: Remove the leftover commented-out second-pass scaffolding
and update the stage comment to reflect that softmax is now computed inline:
delete the commented block that references the second pl.slice of all_raw_scores
and the surrounding commented parallel/chunk scaffolding (the lines mentioning
pl.at(level=pl.Level.CORE_GROUP...), pl.parallel(ctx_blocks...), and the
alternate scores_valid slice using all_raw_scores and sb * Q_HEAD_PAD), and
update the nearby comment to indicate that scores_valid (created from raw_scores
via pl.slice using Q_HEAD_BATCH, SEQ_TILE, valid_len computed from SEQ_TILE,
ctx_len and s0) is handled in the same sb loop rather than a separate pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f4731d3-7bf7-4c2d-9fdb-816515ba8aff

📥 Commits

Reviewing files that changed from the base of the PR and between aebc330 and 5f3abd5.

📒 Files selected for processing (1)
  • examples/models/qwen3/qwen3_32b_decode_scope2.py

dump_passes=dump_passes,
backend_type=backend,
runtime_profiling=runtime_profiling,
pto_isa_commit=
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="$(fd -p 'qwen3_32b_decode_scope2.py' examples | head -n1)"

python - <<'PY' "$file"
import ast
import pathlib
import sys

path = pathlib.Path(sys.argv[1])
src = path.read_text()

try:
    ast.parse(src, filename=str(path))
    print("parse ok")
except SyntaxError as exc:
    print(f"{exc.filename}:{exc.lineno}:{exc.offset}: {exc.msg}")
    if exc.text:
        print(exc.text.rstrip())
    raise
PY

Repository: hw-native-sys/pypto-lib

Length of output: 502


🏁 Script executed:

#!/bin/bash
fd -p 'qwen3_32b_decode_scope2.py' examples | head -n1 | xargs sed -n '490,505p'

Repository: hw-native-sys/pypto-lib

Length of output: 459


Fix the incomplete pto_isa_commit keyword argument.

RunConfig(..., pto_isa_commit=) on line 498 is invalid Python syntax. The module cannot parse. Provide a value for this argument or remove it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/qwen3/qwen3_32b_decode_scope2.py` at line 498, The RunConfig
invocation contains an incomplete keyword argument pto_isa_commit= which is
invalid syntax; open the RunConfig call (the RunConfig(...) expression) and
either remove the pto_isa_commit parameter entirely or assign a valid value
(e.g., pto_isa_commit=False or pto_isa_commit=True) so the code parses; ensure
the chosen boolean is consistent with surrounding flags and tests that reference
pto_isa_commit in this module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant