Skip to content

Feature gem5#353

Closed
tinebp wants to merge 3 commits into
tinebp-patch-2from
feature_gem5
Closed

Feature gem5#353
tinebp wants to merge 3 commits into
tinebp-patch-2from
feature_gem5

Conversation

@tinebp
Copy link
Copy Markdown
Collaborator

@tinebp tinebp commented May 18, 2026

No description provided.

tinebp and others added 3 commits May 18, 2026 02:48
Adds end-to-end gem5 SE-mode integration for Vortex. The simulated host
CPU (x86 or ARM) drives a VortexGPGPU device over the OPAE MMIO+DMA
command protocol; the device internally runs SimX cycle-by-cycle from
gem5's event loop. Validated via ci/regression.sh --gem5: hello +
vecadd + sgemm on both ISAs, 16 s wall.

Three moving parts (see docs/gem5_integration.md and
docs/proposals/gem5_simx_v3_proposal.md for full design rationale):

  1. Device library (sim/simx/gem5/vortex_gpgpu.{cpp,h}, USE_GEM5=1)
     - Wraps a vortex::Processor with a C ABI the gem5 SimObject calls.
     - Full OPAE protocol state machine: cmd_args, busy bit, dcr_rsp,
       async pending_cmd dispatch.
     - Phase-2 in-process smoke driver (sim/simx/gem5/gem5_smoke_main.cpp)
       proves the library works without gem5 installed.

  2. gem5 SimObject (sim/simx/gem5/vortex_gpgpu_dev.{cc,hh} + .py +
     SConscript)
     - DmaDevice subclass; dlopens libvortex-gem5.so; ticks
       Processor::cycle() from EventFunctionWrapper.
     - CMD_MEM_{READ,WRITE} -> dmaAction; CMD_RUN -> schedule tick;
       CMD_DCR_* -> synchronous library passthrough.
     - Installed into a pinned gem5 release by sim/simx/gem5/install.sh,
       which ci/gem5_install.sh fetches + builds (v25.0.0.1, both
       build/{X86,ARM}/gem5.opt).

  3. Host runtime (sw/runtime/gem5/{vortex.cpp,driver.{cpp,h},Makefile})
     - OPAE-shaped vx_* callbacks; direct mmap'd MMIO + bump-allocator
       pinned region.
     - HOST_ARCH switch (x86_64 / aarch64 / armhf) -> matching cross
       compiler, output to \$arch/ subdir so x86 + ARM coexist.
     - All three legacy-vortex_gem5 bug-catalog items addressed:
         B9  cache flush before download via per-core DCR_READ
         B13 multi-arch via HOST_ARCH (was hardcoded armhf in legacy)
         B14 mmio_fence() (mfence / dmb sy) centralised in issue_cmd()

SimX-side prerequisites (also shared with SST integration):
  - Processor::cycle() + Memory* memsim() accessor (sim/simx/processor.*)
  - sw/common/bitmanip.h: added missing <type_traits> + <algorithm>
    includes (defensive header hygiene; was hit when gem5 sources
    became the first to transitively include constants.h)

ARM e2e specifics:
  - tests/regression/common.mk + sw/runtime/stub/Makefile take the
    same HOST_ARCH switch; aarch64 binaries are suffixed (-aarch64) so
    x86 and ARM coexist in the same dir.
  - ci/gem5_test_vortex_app.py calls gem5's setInterpDir() to redirect
    the ELF interpreter (gem5's loader reads PT_INTERP directly, NOT
    via syscalls -- RedirectPath alone isn't enough) and adds
    RedirectPath entries for /lib/aarch64-linux-gnu -> /usr/
    aarch64-linux-gnu/lib (for libc/libstdc++ at runtime).

CI integration:
  - ci/regression.sh.in: new gem5() function (builds prereqs, runs
    standalone hello + e2e vecadd/sgemm, each timeout 120). ARM matrix
    opt-in via VORTEX_GEM5_ARM=1.
  - .github/workflows/ci.yml: ci/gem5_install.sh appended to Setup
    Toolchain (cache-gated like SST), GEM5_HOME exported, gem5 entry
    added to tests matrix (excluded from xlen=64 since the device
    library is XLEN-locked).
  - VERSION: GEM5_REV=v25.0.0.1 added.
  - configure: @GEM5_REV@ substitution.

How to test:
    cd build/
    ./ci/gem5_install.sh                          # first time only
    sudo apt install -y gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
    VORTEX_GEM5_ARM=1 ./ci/regression.sh --gem5
    # Expect 6 PASSED runs in ~16s wall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls upstream's Command Processor work (RTL block + pure-v2 callbacks_t
+ vortex::CommandProcessor C++ model + vortex2.h dispatcher) and rewrites
the gem5 backend on top of it. The OPAE-shaped MMIO command FSM and its
host-side ready_wait poll loop are deleted; the device now exposes only
a CP regfile (32-bit PIO) and a BAR-mapped VRAM range, and the host
runtime is a thin platform shim.

Design (docs/proposals/gem5_v2_cp_migration_proposal.md):
- Single control plane: CP regfile MMIO at PIO_BASE+0x0..0x200.
- Single data plane: host memcpy through PIN_BASE (mapped to in-process
  simx::RAM via SimObject second AddrRange) — same bytes the CP and
  Vortex see.
- Event-driven: cpTickEvent_ and vortexTickEvent_ self-schedule only
  while their respective engines have work; the CP's vortex_start hook
  trampolines into the SimObject to schedule the Vortex tick chain.
- DevMemAccessor seam (sim/simx/gem5/dev_mem.{h,cpp}) backed by
  InProcessDevMem in v1; swappable to a gem5 DMA-port path in v2
  without touching CP hook code or Vortex memory code.
- Multi-queue PIO map from day one (MAX_QUEUES=4 reserved; v1 host
  runtime exercises Q0 only).

Validated end-to-end on both ISAs:
  - X86  standalone hello / e2e vecadd -n16 / e2e sgemm -n4
  - ARM  standalone hello / e2e vecadd-aarch64 / e2e sgemm-aarch64

Merge conflict resolution:
- sw/runtime/stub/Makefile: kept HOST_ARCH switch (ours) + new v2
  dispatcher SRCS (theirs).

Cleanup:
- Deleted sim/simx/gem5/gem5_smoke_main.cpp (in-process smoke driver;
  its coverage is a subset of the gem5 standalone test).
- Deleted sim/simx/gem5/hello.c (Phase-0 ARM cross-toolchain smoke;
  the ARM regression matrix covers the same path automatically).
- Updated docs/gem5_integration.md for the CP-first design.
- Marked gem5_simx_v3_proposal.md §3/§4 superseded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename + generalize the per-backend gem5 and SST test runners so they
share a uniform env-var interface and a common naming convention that
makes the modal distinction explicit.

Naming layout (hostless = no host CPU; e2e = host CPU + dispatcher + CP):

                  hostless                     e2e
    gem5    gem5_run_hostless_app.py    gem5_run_app.py
    SST     sst_run_hostless_app.py     (reserved; no SST CPU integration today)

gem5 changes:
- ci/gem5_test_vortex_hello.py → ci/gem5_run_hostless_app.py: parameterized
  by VORTEX_GEM5_DEV_LIB + VORTEX_TEST_DIR + VORTEX_TEST_KERNEL (default
  kernel.vxbin). Drops the hardcoded VORTEX_GEM5_KERNEL path; any
  regression test's kernel.vxbin can now run hostless without its host
  binary.
- ci/gem5_test_vortex_app.py → ci/gem5_run_app.py (rename only).

SST changes:
- Collapse 4 hardcoded stubs (sst_test_vortex_{hello,fibonacci,vecadd,
  conform}.py) into ci/sst_run_hostless_app.py — same env-var
  interface as the gem5 hostless runner.
- Delete ci/sst_test_vortex_memHierarchy.py: not called by regression
  and the wiring recipe is preserved in
  docs/proposals/sst_simx_v3_proposal.md §6.
- Verify USE_SST=1 builds clean post-merge; full SST regression matrix
  (hello / fibonacci / vecadd / conform) passes end-to-end through
  ci/sst_run_hostless_app.py.

Other cleanups:
- ci/regression.sh.in: rewrite gem5() + sst() entries against the new
  runner names + env vars.
- docs/gem5_integration.md: update both invocation examples and the
  reference-implementations list.
- docs/proposals/sst_simx_v3_proposal.md: add an "Implemented" status
  note recording the runner consolidation + the reserved sst_run_app.py
  slot for a future host-CPU SST integration.
- docs/proposals/gem5_v2_cp_migration_proposal.md: update validation
  reference to the new runner filename.
- sw/runtime/gem5/Makefile: drop stale vortex_opae.h / AFU_IMAGE_*
  Makefile comment block (the runtime no longer includes vortex_opae.h
  after the pure-v2 callbacks redesign).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tinebp tinebp closed this May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant