Feature gem5 by tinebp · Pull Request #353 · vortexgpgpu/vortex

tinebp · 2026-05-18T12:51:36Z

No description provided.

Adds end-to-end gem5 SE-mode integration for Vortex. The simulated host CPU (x86 or ARM) drives a VortexGPGPU device over the OPAE MMIO+DMA command protocol; the device internally runs SimX cycle-by-cycle from gem5's event loop. Validated via ci/regression.sh --gem5: hello + vecadd + sgemm on both ISAs, 16 s wall. Three moving parts (see docs/gem5_integration.md and docs/proposals/gem5_simx_v3_proposal.md for full design rationale): 1. Device library (sim/simx/gem5/vortex_gpgpu.{cpp,h}, USE_GEM5=1) - Wraps a vortex::Processor with a C ABI the gem5 SimObject calls. - Full OPAE protocol state machine: cmd_args, busy bit, dcr_rsp, async pending_cmd dispatch. - Phase-2 in-process smoke driver (sim/simx/gem5/gem5_smoke_main.cpp) proves the library works without gem5 installed. 2. gem5 SimObject (sim/simx/gem5/vortex_gpgpu_dev.{cc,hh} + .py + SConscript) - DmaDevice subclass; dlopens libvortex-gem5.so; ticks Processor::cycle() from EventFunctionWrapper. - CMD_MEM_{READ,WRITE} -> dmaAction; CMD_RUN -> schedule tick; CMD_DCR_* -> synchronous library passthrough. - Installed into a pinned gem5 release by sim/simx/gem5/install.sh, which ci/gem5_install.sh fetches + builds (v25.0.0.1, both build/{X86,ARM}/gem5.opt). 3. Host runtime (sw/runtime/gem5/{vortex.cpp,driver.{cpp,h},Makefile}) - OPAE-shaped vx_* callbacks; direct mmap'd MMIO + bump-allocator pinned region. - HOST_ARCH switch (x86_64 / aarch64 / armhf) -> matching cross compiler, output to \$arch/ subdir so x86 + ARM coexist. - All three legacy-vortex_gem5 bug-catalog items addressed: B9 cache flush before download via per-core DCR_READ B13 multi-arch via HOST_ARCH (was hardcoded armhf in legacy) B14 mmio_fence() (mfence / dmb sy) centralised in issue_cmd() SimX-side prerequisites (also shared with SST integration): - Processor::cycle() + Memory* memsim() accessor (sim/simx/processor.*) - sw/common/bitmanip.h: added missing <type_traits> + <algorithm> includes (defensive header hygiene; was hit when gem5 sources became the first to transitively include constants.h) ARM e2e specifics: - tests/regression/common.mk + sw/runtime/stub/Makefile take the same HOST_ARCH switch; aarch64 binaries are suffixed (-aarch64) so x86 and ARM coexist in the same dir. - ci/gem5_test_vortex_app.py calls gem5's setInterpDir() to redirect the ELF interpreter (gem5's loader reads PT_INTERP directly, NOT via syscalls -- RedirectPath alone isn't enough) and adds RedirectPath entries for /lib/aarch64-linux-gnu -> /usr/ aarch64-linux-gnu/lib (for libc/libstdc++ at runtime). CI integration: - ci/regression.sh.in: new gem5() function (builds prereqs, runs standalone hello + e2e vecadd/sgemm, each timeout 120). ARM matrix opt-in via VORTEX_GEM5_ARM=1. - .github/workflows/ci.yml: ci/gem5_install.sh appended to Setup Toolchain (cache-gated like SST), GEM5_HOME exported, gem5 entry added to tests matrix (excluded from xlen=64 since the device library is XLEN-locked). - VERSION: GEM5_REV=v25.0.0.1 added. - configure: @GEM5_REV@ substitution. How to test: cd build/ ./ci/gem5_install.sh # first time only sudo apt install -y gcc-aarch64-linux-gnu g++-aarch64-linux-gnu VORTEX_GEM5_ARM=1 ./ci/regression.sh --gem5 # Expect 6 PASSED runs in ~16s wall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pulls upstream's Command Processor work (RTL block + pure-v2 callbacks_t + vortex::CommandProcessor C++ model + vortex2.h dispatcher) and rewrites the gem5 backend on top of it. The OPAE-shaped MMIO command FSM and its host-side ready_wait poll loop are deleted; the device now exposes only a CP regfile (32-bit PIO) and a BAR-mapped VRAM range, and the host runtime is a thin platform shim. Design (docs/proposals/gem5_v2_cp_migration_proposal.md): - Single control plane: CP regfile MMIO at PIO_BASE+0x0..0x200. - Single data plane: host memcpy through PIN_BASE (mapped to in-process simx::RAM via SimObject second AddrRange) — same bytes the CP and Vortex see. - Event-driven: cpTickEvent_ and vortexTickEvent_ self-schedule only while their respective engines have work; the CP's vortex_start hook trampolines into the SimObject to schedule the Vortex tick chain. - DevMemAccessor seam (sim/simx/gem5/dev_mem.{h,cpp}) backed by InProcessDevMem in v1; swappable to a gem5 DMA-port path in v2 without touching CP hook code or Vortex memory code. - Multi-queue PIO map from day one (MAX_QUEUES=4 reserved; v1 host runtime exercises Q0 only). Validated end-to-end on both ISAs: - X86 standalone hello / e2e vecadd -n16 / e2e sgemm -n4 - ARM standalone hello / e2e vecadd-aarch64 / e2e sgemm-aarch64 Merge conflict resolution: - sw/runtime/stub/Makefile: kept HOST_ARCH switch (ours) + new v2 dispatcher SRCS (theirs). Cleanup: - Deleted sim/simx/gem5/gem5_smoke_main.cpp (in-process smoke driver; its coverage is a subset of the gem5 standalone test). - Deleted sim/simx/gem5/hello.c (Phase-0 ARM cross-toolchain smoke; the ARM regression matrix covers the same path automatically). - Updated docs/gem5_integration.md for the CP-first design. - Marked gem5_simx_v3_proposal.md §3/§4 superseded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rename + generalize the per-backend gem5 and SST test runners so they share a uniform env-var interface and a common naming convention that makes the modal distinction explicit. Naming layout (hostless = no host CPU; e2e = host CPU + dispatcher + CP): hostless e2e gem5 gem5_run_hostless_app.py gem5_run_app.py SST sst_run_hostless_app.py (reserved; no SST CPU integration today) gem5 changes: - ci/gem5_test_vortex_hello.py → ci/gem5_run_hostless_app.py: parameterized by VORTEX_GEM5_DEV_LIB + VORTEX_TEST_DIR + VORTEX_TEST_KERNEL (default kernel.vxbin). Drops the hardcoded VORTEX_GEM5_KERNEL path; any regression test's kernel.vxbin can now run hostless without its host binary. - ci/gem5_test_vortex_app.py → ci/gem5_run_app.py (rename only). SST changes: - Collapse 4 hardcoded stubs (sst_test_vortex_{hello,fibonacci,vecadd, conform}.py) into ci/sst_run_hostless_app.py — same env-var interface as the gem5 hostless runner. - Delete ci/sst_test_vortex_memHierarchy.py: not called by regression and the wiring recipe is preserved in docs/proposals/sst_simx_v3_proposal.md §6. - Verify USE_SST=1 builds clean post-merge; full SST regression matrix (hello / fibonacci / vecadd / conform) passes end-to-end through ci/sst_run_hostless_app.py. Other cleanups: - ci/regression.sh.in: rewrite gem5() + sst() entries against the new runner names + env vars. - docs/gem5_integration.md: update both invocation examples and the reference-implementations list. - docs/proposals/sst_simx_v3_proposal.md: add an "Implemented" status note recording the runner consolidation + the reserved sst_run_app.py slot for a future host-CPU SST integration. - docs/proposals/gem5_v2_cp_migration_proposal.md: update validation reference to the new runner filename. - sw/runtime/gem5/Makefile: drop stale vortex_opae.h / AFU_IMAGE_* Makefile comment block (the runtime no longer includes vortex_opae.h after the pure-v2 callbacks redesign). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tinebp and others added 3 commits May 18, 2026 02:48

tinebp closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature gem5#353

Feature gem5#353
tinebp wants to merge 3 commits into
tinebp-patch-2from
feature_gem5

tinebp commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tinebp commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant