Skip to content

🐛 Fix RL Training and Improve Structure#573

Draft
flowerthrower wants to merge 23 commits intomainfrom
bugfix
Draft

🐛 Fix RL Training and Improve Structure#573
flowerthrower wants to merge 23 commits intomainfrom
bugfix

Conversation

@flowerthrower
Copy link
Collaborator

@flowerthrower flowerthrower commented Jan 22, 2026

Description

This PR addresses critical bugs in the RL training process with the following key changes:

Structure Improvements:

  • Redesigned action validation logic (predictorenv.py): Rewrote determine_valid_actions_for_state() with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations.
    - Added helper methods is_circuit_laid_out() and is_circuit_routed() to replace the buggy CheckMap pass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode.

  • Fixed type annotation (actions.py): Corrected do_while parameter type from dict[str, Circuit] to PropertySet and added missing import for Qiskit's PropertySet.

  • Added reproducibility (predictor.py): Set random seed for non-test training runs to ensure reproducible results.

  • Improved VF2Layout error handling (predictorenv.py): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training.

Test Updates:

  • Suppressed deprecation warnings in tket routing test

@denialhaag
Copy link
Member

denialhaag commented Jan 25, 2026

It looks like #572 has created some conflicts here. Since the ty errors were quite ugly to fix, let me know if you want me to take care of the rebase so you don't have to come up with the same workarounds. 🙂

@flowerthrower flowerthrower changed the title Fix RL training bugs 🐛 Fix RL Training and Improve Structure Feb 23, 2026
@flowerthrower
Copy link
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enhanced circuit state awareness with explicit validation for layout and routing phases.
  • Improvements

    • Increased training reproducibility through deterministic initialization.
    • Improved robustness in post-layout handling with graceful error management.

Walkthrough

This pull request updates action validation in the reinforcement learning module by changing the do_while predicate to use PropertySet instead of circuit dictionaries, adds deterministic seeding to the training path for reproducibility, introduces three new circuit state validation methods to explicitly check layout and routing status, relaxes a strict assertion in layout postprocessing to a warning, and updates a test to use an additional parameter in the pytket conversion function.

Changes

Cohort / File(s) Summary
Type System Updates
src/mqt/predictor/rl/actions.py
Updated do_while callable signature from Callable[[dict[str, Circuit]], bool] to Callable[[PropertySet], bool] with added PropertySet import under TYPE_CHECKING.
Environment State Validation
src/mqt/predictor/rl/predictorenv.py
Added three new methods: is_circuit_laid_out (checks if all logical qubits assigned to physical qubits), is_circuit_routed (validates directed two-qubit gates on device coupling map), and determine_valid_actions_for_state (returns valid action list based on circuit state). Replaced hard assertion in VF2Layout postprocessing with warning. Updated imports to include Layout and removed CheckMap.
Training Configuration
src/mqt/predictor/rl/predictor.py
Set random seed to 0 in non-test training path to ensure deterministic behavior alongside existing test-mode seeding.
Test Updates
tests/compilation/test_integration_further_SDKs.py
Updated test_tket_routing to call tk_to_qiskit with perm_warning=False parameter.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as RL Agent
    participant Env as PredictorEnv
    participant Circuit as QuantumCircuit
    participant Layout as Layout Object
    participant CouplingMap as Device CouplingMap
    
    Agent->>Env: Request valid actions for current state
    Env->>Circuit: Check circuit properties
    Env->>Layout: Retrieve layout information
    Env->>Env: is_circuit_laid_out(circuit, layout)
    rect rgba(100, 150, 200, 0.5)
        Note over Env: Verify all logical qubits<br/>mapped to physical qubits
    end
    Env->>CouplingMap: Get device coupling map
    Env->>Env: is_circuit_routed(circuit, coupling_map)
    rect rgba(100, 150, 200, 0.5)
        Note over Env: Verify directed 2-qubit gates<br/>respect device topology
    end
    Env->>Env: determine_valid_actions_for_state()
    rect rgba(100, 150, 200, 0.5)
        Note over Env: Synthesize valid actions based on<br/>synthesis, layout, routing, mapping states
    end
    Env->>Agent: Return list of valid action indices
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 The circuit takes its rightful place,
With validation at each trace,
State checks bloom where assertions fell,
PropertySet's tale to tell,
Our layout dances, routed true! 🎭

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title uses emoji and is too vague; 'Fix RL Training and Improve Structure' is generic and doesn't clearly convey the main changes or scope of the PR. Replace with a more descriptive, emoji-free title that specifically mentions the key fixes, such as: 'Fix action validation state machine and type annotations in RL training'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description comprehensively covers the key changes across all modified files, explains the motivation for fixes, and includes details about bug resolutions and improvements. However, the checklist items are not explicitly marked as completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 435-447: The current is_circuit_laid_out only checks qubits that
appear in instructions and misses idle logical qubits; update
is_circuit_laid_out (and its TranspileLayout handling) to validate every logical
qubit from the circuit (e.g., iterate circuit.qubits or
range(circuit.num_qubits)) against layout.get_virtual_bits() instead of
iterating only instr.qubits, and return False if any circuit qubit is not
present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.
- Around line 360-363: Replace the eager string concatenation in the VF2Layout
check with logger-style formatting: instead of adding
pm.property_set["VF2Layout_stop_reason"] to the message, call logger.warning
with a format string and pass pm.property_set["VF2Layout_stop_reason"] as a
separate argument (referencing VF2Layout_stop_reason, VF2LayoutStopReason,
pm.property_set and logger.warning in predictorenv.py) so the message is
constructed lazily and satisfies Ruff G003.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26e3b01 and 825773f.

📒 Files selected for processing (4)
  • src/mqt/predictor/rl/actions.py
  • src/mqt/predictor/rl/predictor.py
  • src/mqt/predictor/rl/predictorenv.py
  • tests/compilation/test_integration_further_SDKs.py

Comment on lines +435 to +447
def is_circuit_laid_out(self, circuit: QuantumCircuit, layout: TranspileLayout | Layout) -> bool:
"""True if every logical qubit in the circuit has a physical assignment."""
if isinstance(layout, TranspileLayout):
# Use final_layout if available; otherwise fallback to initial_layout
layout = layout.final_layout or layout.initial_layout

if not only_nat_gates:
actions = self.actions_synthesis_indices + self.actions_opt_indices
if self.layout is not None:
actions += self.actions_routing_indices
return actions
v2p = layout.get_virtual_bits()
for instr in circuit.data:
for q in instr.qubits:
if q not in v2p:
# Logical qubit not assigned
return False
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate all logical qubits, not just those appearing in instructions.
The docstring says “every logical qubit”; idle qubits are currently skipped.

🔧 Suggested fix
-        for instr in circuit.data:
-            for q in instr.qubits:
-                if q not in v2p:
-                    # Logical qubit not assigned
-                    return False
+        for q in circuit.qubits:
+            if q not in v2p:
+                # Logical qubit not assigned
+                return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mqt/predictor/rl/predictorenv.py` around lines 435 - 447, The current
is_circuit_laid_out only checks qubits that appear in instructions and misses
idle logical qubits; update is_circuit_laid_out (and its TranspileLayout
handling) to validate every logical qubit from the circuit (e.g., iterate
circuit.qubits or range(circuit.num_qubits)) against layout.get_virtual_bits()
instead of iterating only instr.qubits, and return False if any circuit qubit is
not present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.

@flowerthrower
Copy link
Collaborator Author

Problem

After a Qiskit layout pass (e.g., DenseLayout), the MDP state became:

  • synthesized = True
  • laid_out = True
  • routed = False

determine_valid_actions_for_state() correctly allowed only routing actions.

However, action_masks() filtered out all TKET actions because self.layout was not None.
Since the only registered routing action was RoutingPass (from TKET), this removed the last available routing option and produced an all-False action mask.

Consequence in MaskablePPO

sb3-contrib's MaskablePPO handles an all-False mask by setting all logits to a large negative number instead of negative infinity.

Applying softmax to these equal values results in a uniform distribution, meaning every action — including terminate — had a probability of about 1/22 (~4.5%).

During training with 1000 timesteps, the probability that terminate would never be sampled is extremely small (<0.003%), making a crash almost inevitable.

This never came up in tests that only executed 100 steps.

Crash Mechanism

When the agent sampled terminate in this deadlocked state:

  1. step() re-evaluated determine_valid_actions_for_state()
  2. It correctly detected routed = False
  3. reward calculation failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants