🐛 Fix RL Training and Improve Structure by flowerthrower · Pull Request #573 · munich-quantum-toolkit/predictor

flowerthrower · 2026-01-22T13:35:29Z

Description

This PR addresses critical bugs in the RL training process with the following key changes:

Structure Improvements:

Redesigned action validation logic (predictorenv.py): Rewrote determine_valid_actions_for_state() with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations.
- Added helper methods is_circuit_laid_out() and is_circuit_routed() to replace the buggy CheckMap pass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode.
Fixed type annotation (actions.py): Corrected do_while parameter type from dict[str, Circuit] to PropertySet and added missing import for Qiskit's PropertySet.
Added reproducibility (predictor.py): Set random seed for non-test training runs to ensure reproducible results.
Improved VF2Layout error handling (predictorenv.py): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training.

Test Updates:

Suppressed deprecation warnings in tket routing test

denialhaag · 2026-01-25T10:15:55Z

It looks like #572 has created some conflicts here. Since the ty errors were quite ugly to fix, let me know if you want me to take care of the rebase so you don't have to come up with the same workarounds. 🙂

flowerthrower · 2026-02-23T20:39:05Z

@coderabbitai full review

coderabbitai · 2026-02-23T20:39:11Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-02-23T20:47:14Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Enhanced circuit state awareness with explicit validation for layout and routing phases.
Improvements
- Increased training reproducibility through deterministic initialization.
- Improved robustness in post-layout handling with graceful error management.

Walkthrough

This pull request updates action validation in the reinforcement learning module by changing the do_while predicate to use PropertySet instead of circuit dictionaries, adds deterministic seeding to the training path for reproducibility, introduces three new circuit state validation methods to explicitly check layout and routing status, relaxes a strict assertion in layout postprocessing to a warning, and updates a test to use an additional parameter in the pytket conversion function.

Changes

Cohort / File(s)	Summary
Type System Updates `src/mqt/predictor/rl/actions.py`	Updated `do_while` callable signature from `Callable[[dict[str, Circuit]], bool]` to `Callable[[PropertySet], bool]` with added PropertySet import under TYPE_CHECKING.
Environment State Validation `src/mqt/predictor/rl/predictorenv.py`	Added three new methods: `is_circuit_laid_out` (checks if all logical qubits assigned to physical qubits), `is_circuit_routed` (validates directed two-qubit gates on device coupling map), and `determine_valid_actions_for_state` (returns valid action list based on circuit state). Replaced hard assertion in VF2Layout postprocessing with warning. Updated imports to include Layout and removed CheckMap.
Training Configuration `src/mqt/predictor/rl/predictor.py`	Set random seed to 0 in non-test training path to ensure deterministic behavior alongside existing test-mode seeding.
Test Updates `tests/compilation/test_integration_further_SDKs.py`	Updated `test_tket_routing` to call `tk_to_qiskit` with `perm_warning=False` parameter.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as RL Agent
    participant Env as PredictorEnv
    participant Circuit as QuantumCircuit
    participant Layout as Layout Object
    participant CouplingMap as Device CouplingMap
    
    Agent->>Env: Request valid actions for current state
    Env->>Circuit: Check circuit properties
    Env->>Layout: Retrieve layout information
    Env->>Env: is_circuit_laid_out(circuit, layout)
    rect rgba(100, 150, 200, 0.5)
        Note over Env: Verify all logical qubits<br/>mapped to physical qubits
    end
    Env->>CouplingMap: Get device coupling map
    Env->>Env: is_circuit_routed(circuit, coupling_map)
    rect rgba(100, 150, 200, 0.5)
        Note over Env: Verify directed 2-qubit gates<br/>respect device topology
    end
    Env->>Env: determine_valid_actions_for_state()
    rect rgba(100, 150, 200, 0.5)
        Note over Env: Synthesize valid actions based on<br/>synthesis, layout, routing, mapping states
    end
    Env->>Agent: Return list of valid action indices

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 The circuit takes its rightful place,
With validation at each trace,
State checks bloom where assertions fell,
PropertySet's tale to tell,
Our layout dances, routed true! 🎭

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title uses emoji and is too vague; 'Fix RL Training and Improve Structure' is generic and doesn't clearly convey the main changes or scope of the PR.	Replace with a more descriptive, emoji-free title that specifically mentions the key fixes, such as: 'Fix action validation state machine and type annotations in RL training'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description comprehensively covers the key changes across all modified files, explains the motivation for fixes, and includes details about bug resolutions and improvements. However, the checklist items are not explicitly marked as completed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 435-447: The current is_circuit_laid_out only checks qubits that
appear in instructions and misses idle logical qubits; update
is_circuit_laid_out (and its TranspileLayout handling) to validate every logical
qubit from the circuit (e.g., iterate circuit.qubits or
range(circuit.num_qubits)) against layout.get_virtual_bits() instead of
iterating only instr.qubits, and return False if any circuit qubit is not
present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.
- Around line 360-363: Replace the eager string concatenation in the VF2Layout
check with logger-style formatting: instead of adding
pm.property_set["VF2Layout_stop_reason"] to the message, call logger.warning
with a format string and pass pm.property_set["VF2Layout_stop_reason"] as a
separate argument (referencing VF2Layout_stop_reason, VF2LayoutStopReason,
pm.property_set and logger.warning in predictorenv.py) so the message is
constructed lazily and satisfies Ruff G003.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26e3b01 and 825773f.

📒 Files selected for processing (4)

src/mqt/predictor/rl/actions.py
src/mqt/predictor/rl/predictor.py
src/mqt/predictor/rl/predictorenv.py
tests/compilation/test_integration_further_SDKs.py

src/mqt/predictor/rl/predictorenv.py

coderabbitai · 2026-02-23T20:47:17Z

src/mqt/predictor/rl/predictorenv.py

+    def is_circuit_laid_out(self, circuit: QuantumCircuit, layout: TranspileLayout | Layout) -> bool:
+        """True if every logical qubit in the circuit has a physical assignment."""
+        if isinstance(layout, TranspileLayout):
+            # Use final_layout if available; otherwise fallback to initial_layout
+            layout = layout.final_layout or layout.initial_layout

-        if not only_nat_gates:
-            actions = self.actions_synthesis_indices + self.actions_opt_indices
-            if self.layout is not None:
-                actions += self.actions_routing_indices
-            return actions
+        v2p = layout.get_virtual_bits()
+        for instr in circuit.data:
+            for q in instr.qubits:
+                if q not in v2p:
+                    # Logical qubit not assigned
+                    return False
+        return True


⚠️ Potential issue | 🟡 Minor

Validate all logical qubits, not just those appearing in instructions.
The docstring says “every logical qubit”; idle qubits are currently skipped.

🔧 Suggested fix

- for instr in circuit.data: - for q in instr.qubits: - if q not in v2p: - # Logical qubit not assigned - return False + for q in circuit.qubits: + if q not in v2p: + # Logical qubit not assigned + return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/mqt/predictor/rl/predictorenv.py` around lines 435 - 447, The current is_circuit_laid_out only checks qubits that appear in instructions and misses idle logical qubits; update is_circuit_laid_out (and its TranspileLayout handling) to validate every logical qubit from the circuit (e.g., iterate circuit.qubits or range(circuit.num_qubits)) against layout.get_virtual_bits() instead of iterating only instr.qubits, and return False if any circuit qubit is not present in v2p; keep the existing fallback to layout.final_layout or layout.initial_layout and handle missing/None layout gracefully.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

flowerthrower · 2026-03-04T19:10:49Z

Problem

After a Qiskit layout pass (e.g., DenseLayout), the MDP state became:

synthesized = True
laid_out = True
routed = False

determine_valid_actions_for_state() correctly allowed only routing actions.

However, action_masks() filtered out all TKET actions because self.layout was not None.
Since the only registered routing action was RoutingPass (from TKET), this removed the last available routing option and produced an all-False action mask.

Consequence in MaskablePPO

sb3-contrib's MaskablePPO handles an all-False mask by setting all logits to a large negative number instead of negative infinity.

Applying softmax to these equal values results in a uniform distribution, meaning every action — including terminate — had a probability of about 1/22 (~4.5%).

During training with 1000 timesteps, the probability that terminate would never be sampled is extremely small (<0.003%), making a crash almost inevitable.

This never came up in tests that only executed 100 steps.

Crash Mechanism

When the agent sampled terminate in this deadlocked state:

step() re-evaluated determine_valid_actions_for_state()
It correctly detected routed = False
reward calculation failed

flowerthrower added 9 commits January 22, 2026 10:17

🐛 gracefully handle VF2Layout no-solution-found

8a86438

disable progress bar

f34a9ae

🐛 only consider valid circuits for terminal action

7a89375

🚧 debug predictor

af50584

🐛handle layout fail more gracefully

1e3d3d4

🚧 restructure state machine

25c8a4a

🚧 update state machine

7eb5138

🎨 add strict policy

37653fb

🚧 add og paper strategy

8802fbd

flowerthrower and others added 7 commits February 9, 2026 13:14

🎨 fix og policy

6db8ee7

⏪ remove thesis changes

dcc3810

Merge remote-tracking branch 'origin/main' into bugfix

440d54c

⏪ revert thesis updates

450d2dc

⏪ use og strategy

e69cf51

🐛 fix no-layout found bug

934e46c

Merge branch 'main' into bugfix

825773f

flowerthrower changed the title ~~Fix RL training bugs~~ 🐛 Fix RL Training and Improve Structure Feb 23, 2026

coderabbitai bot requested changes Feb 23, 2026

View reviewed changes

flowerthrower and others added 7 commits February 23, 2026 22:06

Update src/mqt/predictor/rl/predictorenv.py

8b27982

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

🎨 docstring

729ade7

Merge branch 'main' into bugfix

983a4f6

🐛 fix routing check

de36b57

Merge commit '983a4f61b72d10af05bfe762bb72f27aa9304fac' into bugfix

bc667bd

🐛 use find qubit

7c745fe

🐛 fix no valid action bug

0d46fa9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Fix RL Training and Improve Structure#573

🐛 Fix RL Training and Improve Structure#573
flowerthrower wants to merge 23 commits intomainfrom
bugfix

flowerthrower commented Jan 22, 2026 •

edited

Loading

Uh oh!

denialhaag commented Jan 25, 2026 •

edited

Loading

Uh oh!

flowerthrower commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

flowerthrower commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

flowerthrower commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

denialhaag commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flowerthrower commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

flowerthrower commented Mar 4, 2026

Problem

Consequence in MaskablePPO

Crash Mechanism

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flowerthrower commented Jan 22, 2026 •

edited

Loading

denialhaag commented Jan 25, 2026 •

edited

Loading