Feature gem5#353
Closed
tinebp wants to merge 3 commits into
Closed
Conversation
Adds end-to-end gem5 SE-mode integration for Vortex. The simulated host
CPU (x86 or ARM) drives a VortexGPGPU device over the OPAE MMIO+DMA
command protocol; the device internally runs SimX cycle-by-cycle from
gem5's event loop. Validated via ci/regression.sh --gem5: hello +
vecadd + sgemm on both ISAs, 16 s wall.
Three moving parts (see docs/gem5_integration.md and
docs/proposals/gem5_simx_v3_proposal.md for full design rationale):
1. Device library (sim/simx/gem5/vortex_gpgpu.{cpp,h}, USE_GEM5=1)
- Wraps a vortex::Processor with a C ABI the gem5 SimObject calls.
- Full OPAE protocol state machine: cmd_args, busy bit, dcr_rsp,
async pending_cmd dispatch.
- Phase-2 in-process smoke driver (sim/simx/gem5/gem5_smoke_main.cpp)
proves the library works without gem5 installed.
2. gem5 SimObject (sim/simx/gem5/vortex_gpgpu_dev.{cc,hh} + .py +
SConscript)
- DmaDevice subclass; dlopens libvortex-gem5.so; ticks
Processor::cycle() from EventFunctionWrapper.
- CMD_MEM_{READ,WRITE} -> dmaAction; CMD_RUN -> schedule tick;
CMD_DCR_* -> synchronous library passthrough.
- Installed into a pinned gem5 release by sim/simx/gem5/install.sh,
which ci/gem5_install.sh fetches + builds (v25.0.0.1, both
build/{X86,ARM}/gem5.opt).
3. Host runtime (sw/runtime/gem5/{vortex.cpp,driver.{cpp,h},Makefile})
- OPAE-shaped vx_* callbacks; direct mmap'd MMIO + bump-allocator
pinned region.
- HOST_ARCH switch (x86_64 / aarch64 / armhf) -> matching cross
compiler, output to \$arch/ subdir so x86 + ARM coexist.
- All three legacy-vortex_gem5 bug-catalog items addressed:
B9 cache flush before download via per-core DCR_READ
B13 multi-arch via HOST_ARCH (was hardcoded armhf in legacy)
B14 mmio_fence() (mfence / dmb sy) centralised in issue_cmd()
SimX-side prerequisites (also shared with SST integration):
- Processor::cycle() + Memory* memsim() accessor (sim/simx/processor.*)
- sw/common/bitmanip.h: added missing <type_traits> + <algorithm>
includes (defensive header hygiene; was hit when gem5 sources
became the first to transitively include constants.h)
ARM e2e specifics:
- tests/regression/common.mk + sw/runtime/stub/Makefile take the
same HOST_ARCH switch; aarch64 binaries are suffixed (-aarch64) so
x86 and ARM coexist in the same dir.
- ci/gem5_test_vortex_app.py calls gem5's setInterpDir() to redirect
the ELF interpreter (gem5's loader reads PT_INTERP directly, NOT
via syscalls -- RedirectPath alone isn't enough) and adds
RedirectPath entries for /lib/aarch64-linux-gnu -> /usr/
aarch64-linux-gnu/lib (for libc/libstdc++ at runtime).
CI integration:
- ci/regression.sh.in: new gem5() function (builds prereqs, runs
standalone hello + e2e vecadd/sgemm, each timeout 120). ARM matrix
opt-in via VORTEX_GEM5_ARM=1.
- .github/workflows/ci.yml: ci/gem5_install.sh appended to Setup
Toolchain (cache-gated like SST), GEM5_HOME exported, gem5 entry
added to tests matrix (excluded from xlen=64 since the device
library is XLEN-locked).
- VERSION: GEM5_REV=v25.0.0.1 added.
- configure: @GEM5_REV@ substitution.
How to test:
cd build/
./ci/gem5_install.sh # first time only
sudo apt install -y gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
VORTEX_GEM5_ARM=1 ./ci/regression.sh --gem5
# Expect 6 PASSED runs in ~16s wall.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls upstream's Command Processor work (RTL block + pure-v2 callbacks_t
+ vortex::CommandProcessor C++ model + vortex2.h dispatcher) and rewrites
the gem5 backend on top of it. The OPAE-shaped MMIO command FSM and its
host-side ready_wait poll loop are deleted; the device now exposes only
a CP regfile (32-bit PIO) and a BAR-mapped VRAM range, and the host
runtime is a thin platform shim.
Design (docs/proposals/gem5_v2_cp_migration_proposal.md):
- Single control plane: CP regfile MMIO at PIO_BASE+0x0..0x200.
- Single data plane: host memcpy through PIN_BASE (mapped to in-process
simx::RAM via SimObject second AddrRange) — same bytes the CP and
Vortex see.
- Event-driven: cpTickEvent_ and vortexTickEvent_ self-schedule only
while their respective engines have work; the CP's vortex_start hook
trampolines into the SimObject to schedule the Vortex tick chain.
- DevMemAccessor seam (sim/simx/gem5/dev_mem.{h,cpp}) backed by
InProcessDevMem in v1; swappable to a gem5 DMA-port path in v2
without touching CP hook code or Vortex memory code.
- Multi-queue PIO map from day one (MAX_QUEUES=4 reserved; v1 host
runtime exercises Q0 only).
Validated end-to-end on both ISAs:
- X86 standalone hello / e2e vecadd -n16 / e2e sgemm -n4
- ARM standalone hello / e2e vecadd-aarch64 / e2e sgemm-aarch64
Merge conflict resolution:
- sw/runtime/stub/Makefile: kept HOST_ARCH switch (ours) + new v2
dispatcher SRCS (theirs).
Cleanup:
- Deleted sim/simx/gem5/gem5_smoke_main.cpp (in-process smoke driver;
its coverage is a subset of the gem5 standalone test).
- Deleted sim/simx/gem5/hello.c (Phase-0 ARM cross-toolchain smoke;
the ARM regression matrix covers the same path automatically).
- Updated docs/gem5_integration.md for the CP-first design.
- Marked gem5_simx_v3_proposal.md §3/§4 superseded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename + generalize the per-backend gem5 and SST test runners so they
share a uniform env-var interface and a common naming convention that
makes the modal distinction explicit.
Naming layout (hostless = no host CPU; e2e = host CPU + dispatcher + CP):
hostless e2e
gem5 gem5_run_hostless_app.py gem5_run_app.py
SST sst_run_hostless_app.py (reserved; no SST CPU integration today)
gem5 changes:
- ci/gem5_test_vortex_hello.py → ci/gem5_run_hostless_app.py: parameterized
by VORTEX_GEM5_DEV_LIB + VORTEX_TEST_DIR + VORTEX_TEST_KERNEL (default
kernel.vxbin). Drops the hardcoded VORTEX_GEM5_KERNEL path; any
regression test's kernel.vxbin can now run hostless without its host
binary.
- ci/gem5_test_vortex_app.py → ci/gem5_run_app.py (rename only).
SST changes:
- Collapse 4 hardcoded stubs (sst_test_vortex_{hello,fibonacci,vecadd,
conform}.py) into ci/sst_run_hostless_app.py — same env-var
interface as the gem5 hostless runner.
- Delete ci/sst_test_vortex_memHierarchy.py: not called by regression
and the wiring recipe is preserved in
docs/proposals/sst_simx_v3_proposal.md §6.
- Verify USE_SST=1 builds clean post-merge; full SST regression matrix
(hello / fibonacci / vecadd / conform) passes end-to-end through
ci/sst_run_hostless_app.py.
Other cleanups:
- ci/regression.sh.in: rewrite gem5() + sst() entries against the new
runner names + env vars.
- docs/gem5_integration.md: update both invocation examples and the
reference-implementations list.
- docs/proposals/sst_simx_v3_proposal.md: add an "Implemented" status
note recording the runner consolidation + the reserved sst_run_app.py
slot for a future host-CPU SST integration.
- docs/proposals/gem5_v2_cp_migration_proposal.md: update validation
reference to the new runner filename.
- sw/runtime/gem5/Makefile: drop stale vortex_opae.h / AFU_IMAGE_*
Makefile comment block (the runtime no longer includes vortex_opae.h
after the pure-v2 callbacks redesign).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.