Add GR801 radiation-hardened SoC simulation framework scaffold#63
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new standalone Python module that scaffolds a lightweight GR801 radiation-hardened SoC simulation (cores, AI accelerator, sparse paged memory, radiation environment models, and a mission simulator) along with minimal pytest coverage to ensure safe import and basic functionality.
Changes:
- Introduces
gr801_simulation_framework.pyimplementing GR801 system components and a mission simulation driver. - Implements sparse paged memory to avoid multi-GB DDR allocations.
- Adds
tests/test_gr801_simulation_framework.pywith smoke tests for sparse memory behavior and batched inference I/O.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| gr801_simulation_framework.py | New GR801 simulation scaffold (cores/accelerator/memory/radiation/mission simulator) intended to be lightweight and import-safe. |
| tests/test_gr801_simulation_framework.py | Adds basic pytest coverage for sparse memory allocation behavior and batched inference result shape/structure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if int(self.uptime * 100) % 10 == 0: | ||
| self._log_system_state() | ||
|
|
There was a problem hiding this comment.
The logging condition if int(self.uptime * 100) % 10 == 0 will be true for many consecutive timesteps at the beginning due to integer truncation (e.g., for dt=1e-3, int(uptime*100) stays 0 for the first ~10ms), causing log spam and unnecessary overhead. Track the last log time (or the next scheduled log time) instead of using a modulus on a truncated value.
| phase_thresholds: dict[str, float] = { | ||
| "LAUNCH": 300, | ||
| "ORBIT_INSERTION": 1800, | ||
| "SCIENCE_OPS": 3600, | ||
| "COMMUNICATIONS": 7200, | ||
| "DEORBIT": self.mission_duration - 1800, | ||
| } | ||
|
|
||
| for phase, threshold in phase_thresholds.items(): | ||
| if self.simulation_time <= threshold: | ||
| self.mission_phase = phase | ||
| break | ||
|
|
There was a problem hiding this comment.
_update_mission_phase() compares simulation_time <= threshold against a set of thresholds that include DEORBIT: mission_duration - 1800. With the current logic and insertion order, once simulation_time exceeds 7200s it will fall through to DEORBIT for the majority of the mission (until mission_duration - 1800), which is likely not intended. Consider representing thresholds as phase start times and selecting the latest start time <= current time, or otherwise adjusting the comparisons so DEORBIT only activates near the end.
| phase_thresholds: dict[str, float] = { | |
| "LAUNCH": 300, | |
| "ORBIT_INSERTION": 1800, | |
| "SCIENCE_OPS": 3600, | |
| "COMMUNICATIONS": 7200, | |
| "DEORBIT": self.mission_duration - 1800, | |
| } | |
| for phase, threshold in phase_thresholds.items(): | |
| if self.simulation_time <= threshold: | |
| self.mission_phase = phase | |
| break | |
| # Interpret these values as phase *start times* in seconds. | |
| phase_starts: dict[str, float] = { | |
| "LAUNCH": 0.0, | |
| "ORBIT_INSERTION": 300.0, | |
| "SCIENCE_OPS": 1800.0, | |
| "COMMUNICATIONS": 3600.0, | |
| # Start DEORBIT 1800 seconds before mission end (or at t=0 for very short missions). | |
| "DEORBIT": max(0.0, float(self.mission_duration) - 1800.0), | |
| } | |
| # Select the latest phase whose start time is <= current simulation time. | |
| current_phase = self.mission_phase | |
| for phase, start_time in sorted(phase_starts.items(), key=lambda item: item[1]): | |
| if self.simulation_time >= start_time: | |
| current_phase = phase | |
| else: | |
| break | |
| self.mission_phase = current_phase |
| logger = logging.getLogger("GR801_Simulator") | ||
|
|
There was a problem hiding this comment.
Repository code consistently uses logging.getLogger(__name__) to keep logger names hierarchical (e.g. src/utils/analysis_utils.py:31). Using a fixed string here ("GR801_Simulator") breaks that convention and can make log configuration/filtering harder. Prefer logging.getLogger(__name__) (or a module-level LOGGER) unless there’s a specific need for a custom name.
| def test_gr801_radiation_tolerance() -> dict[str, Any]: | ||
| """Test GR801 radiation tolerance in different environments (quick toy run).""" | ||
|
|
||
| environments = [RadiationEnvironment.LEO, RadiationEnvironment.GEO, RadiationEnvironment.JUPITER] | ||
|
|
||
| results: dict[str, Any] = {} | ||
| for env in environments: | ||
| logger.info("Testing GR801 in %s environment", env.value) | ||
| system = GR801System(environment=env) | ||
|
|
||
| # 10 seconds simulated time in 100 ms steps. | ||
| for _ in range(100): | ||
| system.execute_timestep(dt=0.1) | ||
|
|
||
| model = NeuralNetworkModel.get_preset_model(AIWorkload.IMAGE_CLASSIFICATION) | ||
| input_data = np.random.standard_normal((1, *model.input_shape)).astype(np.float32) | ||
|
|
||
| result = system.run_inference(AIWorkload.IMAGE_CLASSIFICATION, input_data) | ||
|
|
||
| results[env.value] = { | ||
| "final_tid": system.total_ionizing_dose, | ||
| "seu_count": len(system.seu_log), | ||
| "inference_accuracy": result["estimated_accuracy"], | ||
| "system_operational": system._system_operational(), | ||
| } | ||
|
|
||
| return results | ||
|
|
There was a problem hiding this comment.
The function name test_gr801_radiation_tolerance() lives in a non-test module and is invoked from __main__. Prefixing with test_ makes it look like a pytest test (and may confuse readers/tools) even though pytest.ini only collects test_*.py under tests/. Consider renaming to something like run_gr801_radiation_tolerance() / simulate_radiation_tolerance().
| def run_mission(self): | ||
| logger.info("Starting space mission simulation for %.0f seconds", self.mission_duration) | ||
|
|
||
| dt = 0.1 | ||
| timesteps = int(max(1, self.mission_duration / dt)) | ||
|
|
||
| for step in range(timesteps): | ||
| self.simulation_time += dt | ||
| self._update_mission_phase() | ||
|
|
||
| try: | ||
| self.gr801_system.execute_timestep(dt) | ||
| self._execute_mission_workload() | ||
| except Exception as exc: # noqa: BLE001 | ||
| logger.error("Mission failure at t=%.1fs: %s", self.simulation_time, exc) | ||
| self.mission_success = False | ||
| break | ||
|
|
There was a problem hiding this comment.
SpaceMissionSimulator.run_mission() runs an inference on every timestep (dt=0.1), which is ~864,000 inferences for a 24h mission. Given the accelerator path allocates a new random projection matrix each inference, this example can become extremely slow and memory-churn heavy. Consider throttling workload execution (e.g., run inference every N timesteps / per phase-specific cadence) and/or caching/reusing projection weights to keep the example practical.
| def read(self, address: int, size: int) -> np.ndarray: | ||
| """Read memory with ECC checking (simplified).""" | ||
|
|
||
| data = self._mem.read(address, size) | ||
|
|
||
| # Simulate occasional ECC detection/correction. We don't store ECC bits; | ||
| # we simulate the outcomes and counters. | ||
| for i in range(0, int(size), 8): | ||
| chunk_addr = int(address) + i | ||
| if self._check_ecc(chunk_addr): | ||
| if self._correct_ecc_error(chunk_addr): | ||
| self.errors_corrected += 1 | ||
| else: | ||
| self.errors_detected += 1 | ||
| self._log_memory_error(chunk_addr, "UNCORRECTABLE") | ||
|
|
||
| return data |
There was a problem hiding this comment.
MemoryController.read() performs an 8-byte chunk loop in Python (for i in range(0, size, 8)), which will scale linearly and become a bottleneck for large reads. If large reads are expected, consider modeling ECC outcomes in a more vectorized/batched way (e.g., sample the number of detected/corrected events for the read and update counters accordingly) rather than iterating per chunk.
| start_time = self.uptime | ||
|
|
||
| if self.ai_accelerator is not None: | ||
| result = self.ai_accelerator.execute_inference(model.operations, input_data) | ||
| self.total_power += GR801Config.NOMINAL_POWER * 0.5 | ||
| else: | ||
| result = self._cpu_inference(input_data) | ||
|
|
||
| execution_time = self.uptime - start_time | ||
| accuracy = self._calculate_radiation_accuracy() | ||
|
|
||
| self.inference_count += 1 | ||
| self.total_operations += int(model.operations) | ||
|
|
||
| return { | ||
| "result": result, | ||
| "execution_time": execution_time, | ||
| "estimated_accuracy": accuracy, | ||
| "power_consumed": self.total_power * max(0.0, execution_time), | ||
| "radiation_effects": len([e for e in self.seu_log if not e.corrected]), | ||
| } |
There was a problem hiding this comment.
run_inference() computes execution_time as self.uptime - start_time, but self.uptime is never advanced during inference, so execution_time (and thus power_consumed) will always be 0. This makes the returned telemetry misleading. Consider either advancing simulated time by the accelerator/CPU inference time, or computing execution time from a dedicated timing model/clock separate from uptime.
| def execute_inference(self, model_ops: int, input_data: np.ndarray) -> np.ndarray: | ||
| """Execute neural network inference.""" | ||
|
|
||
| if not self.active: | ||
| raise RuntimeError("AI Accelerator not active") | ||
|
|
||
| model_ops_i = int(model_ops) | ||
| _execution_time = float(model_ops_i) / self.throughput | ||
|
|
||
| self.power_consumption = GR801Config.NOMINAL_POWER * 1.5 | ||
|
|
||
| result = self._simulate_matrix_operations(input_data) | ||
|
|
||
| self.operations_completed += model_ops_i | ||
| self.mac_utilization = float(self._rng.uniform(0.7, 0.95)) | ||
|
|
||
| self._update_temperature() | ||
| return result | ||
|
|
There was a problem hiding this comment.
AIAccelerator.execute_inference() calculates _execution_time but never uses it (and callers also don’t consume it). If the simulator intends to report or advance simulated time for inference, this value should be returned/propagated or used to update system state; otherwise it should be removed to avoid implying timing is modeled when it isn’t.
| if model is None: | ||
| raise RuntimeError("No model selected") | ||
|
|
||
| results = [np.random.standard_normal(model.output_shape).astype(np.float32) for _core in self.cores] |
There was a problem hiding this comment.
For loop variable '_core' is not used in the loop body.
| results = [np.random.standard_normal(model.output_shape).astype(np.float32) for _core in self.cores] | |
| results = [np.random.standard_normal(model.output_shape).astype(np.float32) for _ in range(len(self.cores))] |
Adds a new module
gr801_simulation_framework.pyimplementing a lightweight GR801 SoC simulation scaffold (cores, AI accelerator, sparse paged memory controller, radiation environment models, and mission simulator).Key points:
Tests:
ruff check .pytest -q tests