Skip to content

Add JIT code generator for PPC64#320

Open
cyrozap wants to merge 11 commits intotevador:masterfrom
cyrozap:power-jit
Open

Add JIT code generator for PPC64#320
cyrozap wants to merge 11 commits intotevador:masterfrom
cyrozap:power-jit

Conversation

@cyrozap
Copy link
Copy Markdown

@cyrozap cyrozap commented Apr 4, 2026

Adds a JIT backend for POWER8 and later Power ISA CPUs. Assembly instructions were restricted to those available in Power ISA v2.06 in order to facilitate adding support for POWER7, but currently only RandomX V1 is supported on those chips due to its lack of AES instructions.

Support has been added for both little-endian and big-endian CPUs, but only little-endian has been tested.

Fixes #132

Adds a JIT backend for POWER8 and later Power ISA CPUs. Assembly
instructions were restricted to those available in Power ISA v2.06 in
order to facilitate adding support for POWER7, but currently only
RandomX V1 is supported on those chips due to its lack of AES
instructions.

Support has been added for both little-endian and big-endian CPUs, but
only little-endian has been tested.

Fixes tevador#132
@cyrozap cyrozap mentioned this pull request Apr 4, 2026
@cyrozap
Copy link
Copy Markdown
Author

cyrozap commented Apr 5, 2026

Benchmarks on a Raptor Computing Systems Talos II with dual POWER9 CPUs:

./randomx-benchmark --auto --verify:

RandomX benchmark v2.0
 - Argon2 implementation: reference
 - light memory mode (256 MiB)
 - JIT compiled mode
 - hardware AES mode
 - small pages mode
 - batch mode
Initializing ...
Memory initialized in 1.01555 s
Initializing 1 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: 10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Reference result:  10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Performance: 25.3256 ms per hash

./randomx-benchmark --auto --verify --v2:

RandomX benchmark v2.0
 - Argon2 implementation: reference
 - light memory mode (256 MiB)
 - JIT compiled mode
 - hardware AES mode
 - small pages mode
 - batch mode
Initializing ...
Memory initialized in 1.0161 s
Initializing 1 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: b85d79e080b10b6ad28c2e6c993601a1361917dba979e03a0a8f7248aaf4ba52
Reference result:  b85d79e080b10b6ad28c2e6c993601a1361917dba979e03a0a8f7248aaf4ba52
Performance: 26.8799 ms per hash

./randomx-benchmark --auto --mine:

RandomX benchmark v2.0
 - Argon2 implementation: reference
 - full memory mode (2080 MiB)
 - JIT compiled mode
 - hardware AES mode
 - small pages mode
 - batch mode
Initializing (144 threads) ...
Memory initialized in 2.69269 s
Initializing 1 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: 10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Reference result:  10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Performance: 237.531 hashes per second

./randomx-benchmark --auto --mine --v2:

RandomX benchmark v2.0
 - Argon2 implementation: reference
 - full memory mode (2080 MiB)
 - JIT compiled mode
 - hardware AES mode
 - small pages mode
 - batch mode
Initializing (144 threads) ...
Memory initialized in 2.71528 s
Initializing 1 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: b85d79e080b10b6ad28c2e6c993601a1361917dba979e03a0a8f7248aaf4ba52
Reference result:  b85d79e080b10b6ad28c2e6c993601a1361917dba979e03a0a8f7248aaf4ba52
Performance: 177.077 hashes per second

@SChernykh
Copy link
Copy Markdown
Collaborator

due to its lack of AES instructions.

RandomX v1 also uses AES in the scratchpad hash/fill step, so you can use the existing soft AES code for RandomX v2 loop. It shouldn't be that hard compared to the full JIT implementation that you've done already.

cyrozap added 3 commits April 5, 2026 11:51
This only saves one or two instructions, but there are no drawbacks to
how this optimization is implemented so there's no reason not to do it.
@tevador
Copy link
Copy Markdown
Owner

tevador commented Apr 6, 2026

FYI, the ppc64le build is failing: https://github.com/tevador/RandomX/actions/runs/24008647458/job/70025217444

cyrozap added 2 commits April 6, 2026 10:08
This only saves one or two instructions in a very cold path in the code,
but there are no drawbacks to implementing this optimization so there's
no reason not to do it.
This optimization can save one or two instructions for some immediates.
@cyrozap
Copy link
Copy Markdown
Author

cyrozap commented Apr 6, 2026

FYI, the ppc64le build is failing: https://github.com/tevador/RandomX/actions/runs/24008647458/job/70025217444

From the CI log:

/home/runner/work/RandomX/RandomX/src/cpu.cpp:52:18: fatal error: asm/cputable.h: No such file or directory
   52 |         #include <asm/cputable.h>
      |                  ^~~~~~~~~~~~~~~~

asm/cputable.h is a Linux kernel header, so I think you would need to run apk add linux-headers in the ppc64le VM setup script to resolve that error.

We could also avoid the dependency entirely by #define-ing the value ourselves. This constant is needed in both src/cpu.cpp and src/jit_compiler_ppc64.cpp, so if you want to do this we'd have to make that change in both places (or make our own header that gets included into both files). Edit: I just realized I didn't actually need to call getauxval a second time, so I pushed a change to fix that. Now the constant is only needed in src/cpu.cpp and in the JIT backend we just query the cpu object for feature support when we need it.

Which option would you prefer?

FWIW, in the future I plan to use more of those definitions (full list is here) to detect the system's ISA version in order to patch in more-optimized code for the newer architectures, so my personal preference is to just use the Linux kernel header so there's no possibility of copy/paste issues. That said, I'll understand completely if you want to avoid a dependency on kernel headers just for a handful of constant definitions (which AIUI should never change between kernel versions).

cyrozap added 2 commits April 6, 2026 15:35
We already query the CPU feature support in cpu.cpp, so there's no need
to do it again.
This is the same split in Debian--the ppc64el port is only supported on
POWER8 and later, so POWER7 and earlier can only run Debian ppc64
(big-endian 64-bit PowerPC). Because of this, we set the default
little-endian architecture to POWER8. And since the RandomX JIT backend
for PPC64 requires VSX, which is only supported by POWER7 and later, the
lowest we can set the default big-endian architecture to is POWER7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

POWER support

3 participants