🐛 Fix RL Training and Improve Structure#573
Conversation
|
It looks like #572 has created some conflicts here. Since the |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis pull request updates action validation in the reinforcement learning module by changing the Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as RL Agent
participant Env as PredictorEnv
participant Circuit as QuantumCircuit
participant Layout as Layout Object
participant CouplingMap as Device CouplingMap
Agent->>Env: Request valid actions for current state
Env->>Circuit: Check circuit properties
Env->>Layout: Retrieve layout information
Env->>Env: is_circuit_laid_out(circuit, layout)
rect rgba(100, 150, 200, 0.5)
Note over Env: Verify all logical qubits<br/>mapped to physical qubits
end
Env->>CouplingMap: Get device coupling map
Env->>Env: is_circuit_routed(circuit, coupling_map)
rect rgba(100, 150, 200, 0.5)
Note over Env: Verify directed 2-qubit gates<br/>respect device topology
end
Env->>Env: determine_valid_actions_for_state()
rect rgba(100, 150, 200, 0.5)
Note over Env: Synthesize valid actions based on<br/>synthesis, layout, routing, mapping states
end
Env->>Agent: Return list of valid action indices
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 435-447: The current is_circuit_laid_out only checks qubits that
appear in instructions and misses idle logical qubits; update
is_circuit_laid_out (and its TranspileLayout handling) to validate every logical
qubit from the circuit (e.g., iterate circuit.qubits or
range(circuit.num_qubits)) against layout.get_virtual_bits() instead of
iterating only instr.qubits, and return False if any circuit qubit is not
present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.
- Around line 360-363: Replace the eager string concatenation in the VF2Layout
check with logger-style formatting: instead of adding
pm.property_set["VF2Layout_stop_reason"] to the message, call logger.warning
with a format string and pass pm.property_set["VF2Layout_stop_reason"] as a
separate argument (referencing VF2Layout_stop_reason, VF2LayoutStopReason,
pm.property_set and logger.warning in predictorenv.py) so the message is
constructed lazily and satisfies Ruff G003.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
src/mqt/predictor/rl/actions.pysrc/mqt/predictor/rl/predictor.pysrc/mqt/predictor/rl/predictorenv.pytests/compilation/test_integration_further_SDKs.py
| def is_circuit_laid_out(self, circuit: QuantumCircuit, layout: TranspileLayout | Layout) -> bool: | ||
| """True if every logical qubit in the circuit has a physical assignment.""" | ||
| if isinstance(layout, TranspileLayout): | ||
| # Use final_layout if available; otherwise fallback to initial_layout | ||
| layout = layout.final_layout or layout.initial_layout | ||
|
|
||
| if not only_nat_gates: | ||
| actions = self.actions_synthesis_indices + self.actions_opt_indices | ||
| if self.layout is not None: | ||
| actions += self.actions_routing_indices | ||
| return actions | ||
| v2p = layout.get_virtual_bits() | ||
| for instr in circuit.data: | ||
| for q in instr.qubits: | ||
| if q not in v2p: | ||
| # Logical qubit not assigned | ||
| return False | ||
| return True |
There was a problem hiding this comment.
Validate all logical qubits, not just those appearing in instructions.
The docstring says “every logical qubit”; idle qubits are currently skipped.
🔧 Suggested fix
- for instr in circuit.data:
- for q in instr.qubits:
- if q not in v2p:
- # Logical qubit not assigned
- return False
+ for q in circuit.qubits:
+ if q not in v2p:
+ # Logical qubit not assigned
+ return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/mqt/predictor/rl/predictorenv.py` around lines 435 - 447, The current
is_circuit_laid_out only checks qubits that appear in instructions and misses
idle logical qubits; update is_circuit_laid_out (and its TranspileLayout
handling) to validate every logical qubit from the circuit (e.g., iterate
circuit.qubits or range(circuit.num_qubits)) against layout.get_virtual_bits()
instead of iterating only instr.qubits, and return False if any circuit qubit is
not present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>
ProblemAfter a Qiskit layout pass (e.g., DenseLayout), the MDP state became:
However, Consequence in MaskablePPO
Applying softmax to these equal values results in a uniform distribution, meaning every action — including During training with 1000 timesteps, the probability that This never came up in tests that only executed 100 steps. Crash MechanismWhen the agent sampled
|
Description
This PR addresses critical bugs in the RL training process with the following key changes:
Structure Improvements:
Redesigned action validation logic (
predictorenv.py): Rewrotedetermine_valid_actions_for_state()with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations.- Added helper methods
is_circuit_laid_out()andis_circuit_routed()to replace the buggyCheckMappass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode.Fixed type annotation (
actions.py): Correcteddo_whileparameter type fromdict[str, Circuit]toPropertySetand added missing import for Qiskit'sPropertySet.Added reproducibility (
predictor.py): Set random seed for non-test training runs to ensure reproducible results.Improved VF2Layout error handling (
predictorenv.py): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training.Test Updates: