Skip to content

Update MCU SoC post-P&R test data#44

Closed
github-actions[bot] wants to merge 11 commits intomainfrom
auto/update-mcu-soc-data
Closed

Update MCU SoC post-P&R test data#44
github-actions[bot] wants to merge 11 commits intomainfrom
auto/update-mcu-soc-data

Conversation

@github-actions
Copy link

Automated rebuild of tests/mcu_soc/data/ using librelane.

Trigger: workflow_dispatch

The mcu-soc-metal CI job will validate simulation.

Add --timing-vcd flag that produces timing-accurate VCD output where
signal transitions are offset from clock edges by their computed
arrival times. The GPU kernel already computes per-gate arrival times
for setup/hold checking; this feature writes them to global memory
so the host can produce sub-cycle-accurate output.

Changes:
- GPU kernels (Metal/CUDA): write shared_writeout_arrival to global
  memory at arrival_state_offset when enabled
- FlattenedScriptV1: add timing_arrivals_enabled, arrival_state_offset
  fields; update effective_state_size() for 3-section layout
- vcd_io: add expand_states_for_arrivals(), split_arrival_states(),
  write_output_vcd_timed() with ps-to-timescale conversion
- loom CLI: wire --timing-vcd flag, SimParams.arrival_state_offset,
  and timed VCD writer dispatch

Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
@github-actions github-actions bot force-pushed the auto/update-mcu-soc-data branch from f43e281 to 7fdcde9 Compare February 27, 2026 12:32
robtaylor and others added 10 commits February 27, 2026 12:46
Add detailed section to Known Issues explaining why Loom only supports
edge-triggered DFFs, why CVC's test suite can't be reused as reference
tests (NAND-latch flip-flops), and what would be needed to add latch
support (new DriverType, two-phase evaluation, GPU kernel changes).

Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
- Change IdCode::from(0) to IdCode(0) for vcd_ng tuple struct API
- Make write_output_vcd_timed generic over W: Write for testability
- Remove writer.flush() calls (vcd_ng::Writer has no flush method)
- Add 8 comprehensive tests for expand/split/write timing arrivals

Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The Metal kernel uses a double-buffered read pattern where t4_5 holds
the current stage's data while the next stage's data is pre-loaded. The
gate_delay extraction was incorrectly placed AFTER the t4_5 overwrite,
causing it to read the next stage's padding slot instead of the current
one. For single-stage designs (like inv_chain), this read garbage/zeros.

Fix: extract gate_delay from t4_5.c4 before overwriting t4_5.

Also fix arrival tracking to add gate_delay even for pass-through
positions (orb == 0xFFFFFFFF) across all hierarchy levels, since
pass-throughs can represent physical cells (e.g., inverter chains)
with accumulated delays.

Also fix load_timing_from_sdf to iterate all cell origins per AIG pin
instead of only the first, enabling correct delay accumulation for
inverter chains collapsed to a single AIG wire.

Verified: inv_chain test produces correct 1323ps arrival delay matching
the analytical SDF sum (CLK→Q=350ps + 16 inverters=973ps).

Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
Suppress unused variable warnings (staged, num_srams, num_ios, num_dup,
part_end) and remove dead assignments (offset before break, script_pi
before break) that were cluttering build output.

Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
- tb_cvc.v: CVC testbench with SDF annotation for inv_chain timing
  validation (expected total delay: 1323ps)
- inv_chain_stimulus.vcd: Input stimulus for timing VCD tests
- compare_vcd.py: VCD comparison script for Loom vs CVC output
- watchlist.json: Signal watchlist for timing_sim_cpu tracing
- CI workflow: CVC reference simulation job for automated validation

Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
Dockerfile builds CVC (open-src-cvc) from source on linux/amd64 with
gcc/binutils for its native code compilation. run_cvc.sh builds the
image, runs the inv_chain testbench with SDF back-annotation, and
compares against Loom's timing output.

Results: CVC reports 1235ps total delay vs Loom's 1323ps — an 88ps
(7.1%) conservative overestimate. This is expected: Loom uses
max(rise, fall) per cell since the GPU kernel processes 32 packed
signals and cannot track per-signal transition direction. CVC tracks
actual rise/fall transitions through the inverter chain.

The 88ps decomposes as:
  8 inverter stages × 10ps IOPATH rise/fall asymmetry = 80ps
  8 interconnect wires × 1ps rise/fall asymmetry = 8ps

Usage: bash tests/timing_test/cvc/run_cvc.sh

Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
Add detailed section to timing-simulation.md covering the three
independent sources of timing overestimation:

1. max(rise, fall) per cell — GPU can't track transition direction
   across 32 packed signals (80ps / 6.5% for inv_chain)
2. max wire delay across multi-input pins — single wire delay per
   cell regardless of which input is critical (8ps for inv_chain)
3. max arrival across 32 packed signals per thread — mitigated by
   timing-aware bit packing (0ps for inv_chain, larger in practice)

Documents CVC reference validation: Loom 1323ps vs CVC 1235ps (88ps
/ 7.1% conservative overestimate) for the inv_chain design.

Updates implementation phases to reflect completed GPU arrival
tracking and timing-aware VCD output.

Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
40 outputs at 5 logic depths (3, 5, 9, 13, 17) exercise Source 3
overestimation in timing-aware bit packing. CVC reference shows
distinct arrival times per group (513ps to 1286ps), confirming the
conservative timing model. Includes hand-crafted SDF, stimulus VCD,
CVC testbench, and Docker runner script.

Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The previous fallback logic used `find | sort -r | head -1` which
grabbed a pre-PnR SDF (step 08) alphabetically instead of the
post-PnR SDF from STAPostPNR (step 51) that includes interconnect
delays. Now explicitly searches for stapostpnr nom_tt SDF first.

Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
@robtaylor
Copy link
Contributor

Superseded by #55 (PnR netlist data only) + #49 (code changes, already merged to main). The code changes in this PR were duplicates from the timing-vcd-readback branch.

@robtaylor robtaylor closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant