feat(bench/compare): add fio xnvme compare workflow by yonggilsong · Pull Request #64 · xnvme/aisio

yonggilsong · 2026-05-05T22:36:25Z

Summary

Add an AiSIO compare benchmark workflow for fio + xNVMe so we can compare
the spdk and upcie backends under the same benchmark matrix.

This series adds:

tasks/bench_fio_compare.yaml
fio_xnvme support in the benchmark path
raw result collection into a single compare JSON
compare HTML visualization
validation and smoke coverage
safer CPU control handling when cpupower, turbo sysfs, or CPU frequency
sysfs entries are unavailable
quoted PCI_BLACKLIST handling for multi-BDF device configs

Usage

Setup order:

cijoe --monitor -c configs/transport.toml tasks/setup_ubuntu.yaml
cijoe --monitor -c configs/transport.toml -c configs/udmabuf_import.toml tasks/setup_udmabuf_import.yaml
cijoe --monitor -c configs/transport.toml -c configs/nvstack.toml tasks/setup_nvstack.yaml
cijoe --monitor -c configs/transport.toml -c configs/aisio.toml tasks/setup_aisio.yaml

Run the compare benchmark:

cijoe --monitor \
  -c configs/transport.toml \
  -c configs/bench_fio_compare.toml \
  -c configs/devices_*.toml \
  tasks/bench_fio_compare.yaml

Results are written under cijoe-output/artifacts/fio-compare/<bdf>/.

Current defaults are runtime=5s and fio_size=16GiB.

TODO

Add a standalone fio_xnvme script with add_args() / main()
Reduce duplication in bench_fio_compare.yaml
Revisit platform-specific backend/device binding behavior
Improve CPU frequency support detection on systems without standard sysfs exposure

Allow benchmark runs to continue when turbo sysfs controls or cpupower are unavailable on the target system. Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

karlowich

There's a lot take in here, but it looks very promising!
So far, I have one comment :)
I will do another pass later.

karlowich · 2026-05-08T12:14:56Z

+- name: prepare_read_upcie
+  run: |
+    {{ config.xnvme.driver.prefix }} xnvme-driver reset || true
+    bash -lc 'ns_path=$(find /sys/bus/pci/devices/{{ config.devices[0].pci_addr }}/nvme -mindepth 2 -maxdepth 2 -type d -name "nvme*n*" 2>/dev/null | head -n 1); if [ -n "$ns_path" ]; then blkdiscard -f "/dev/$(basename "$ns_path")" || true; fi'
+    modprobe uio_pci_generic
+    DRIVER_OVERRIDE=uio_pci_generic {{ config.xnvme.driver.prefix }} xnvme-driver


Is there any reason not to do all the SPDK benchmarks first and then all the uPCIe? This way you only need the "prepare" steps once per backend

I agree the backend-first structure is cleaner and would avoid repeated prepare steps. I'll rework the workflow so all SPDK runs happen together and all uPCIe runs happen together, unless I hit an ordering dependency while moving the read-prefill flow out of the helper.

I separated trim from prepare in the updated workflow. That makes the device-state transition explicit, and it also lets write and randwrite start from a freshly trimmed state instead of inheriting the state left by earlier workloads.

I suggest taking the trim code out of the workflow file and creating a separate script for it. That way we can simplify the workflow steps visually.
Likewise, we can create a separate script for the pre-fill step.
I want to move towards reducing the complexity of bench_helper and bench_runall.

That makes sense. I’ll move the trim and pre-fill logic out of the workflow.

Do you have a preference between adding separate action modules like fio_xnvme_trim.py / fio_xnvme_prefill.py, or keeping trim, pre-fill, and fio command construction together in fio_xnvme.py?

The separate modules would make the workflow steps explicit and easier to extend independently, while keeping them in fio_xnvme.py would keep the compare-benchmark-specific fio/xNVMe logic in one place.

NaddiNadja · 2026-05-08T12:32:45Z

In regards to the first commit: Why is it necessary to account for cpupower and turbo not being available on the platform? If the machine has been provisioned following the steps and tasks in the AiSIO repo, these should be available, no?

NaddiNadja

I have a couple questions / suggestions. Haven't looked through it all yet, but wanted to give this part of the feedback before going on weekend :)

yonggilsong · 2026-05-10T22:36:50Z

In regards to the first commit: Why is it necessary to account for cpupower and turbo not being available on the platform? If the machine has been provisioned following the steps and tasks in the AiSIO repo, these should be available, no?

In the ideal setup, yes, these controls should be available after following the AiSIO provisioning flow. The issue I ran into was that this assumption does not always hold in practice on all target machines. In particular, custom dmabuf kernels do not necessarily have a matching linux-tools package available for cpupower, and some platforms still do not expose the expected CPU frequency/turbo control interfaces even after provisioning. I added this handling so the benchmark reports that CPU control was unavailable instead of aborting mid-run or silently misrepresenting the environment.

Add a dedicated fio+xNVMe compare workflow for SPDK and uPCIe. Run read, randread, write, and randwrite in one workflow. Store raw results under a BDF-specific artifact root, collect them into a single benchmark-results.json, and render a compare-specific HTML report with the benchmark-fio-compare template. Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

…sysfs Skip CPU frequency logging cleanly when the target platform does not expose the sysfs files the logger expects. This lets benchmark runs continue without aborting on empty frequency samples. Also make the dmabuf kernel setup explicitly enable the cpufreq-related kernel options needed for scaling_cur_freq reporting on this platform. Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

Quote the PCI_BLACKLIST value in device configs so xnvme-driver prefixes survive shell parsing when multiple BDFs are listed. Without the quotes, a space-separated blacklist is split into separate shell words before xnvme-driver sees it. Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

karlowich · 2026-05-14T09:47:21Z

@yonggilsong

In particular, custom dmabuf kernels do not necessarily have a matching linux-tools package available for cpupower

This has been addressed, now we install cpupower alongside the custom kernel.
Can do we still need the CPU-freq skip? It would be a nice simplification if we don't need it :)

yonggilsong · 2026-05-19T13:41:28Z

@yonggilsong

In particular, custom dmabuf kernels do not necessarily have a matching linux-tools package available for cpupower

This has been addressed, now we install cpupower alongside the custom kernel. Can do we still need the CPU-freq skip? It would be a nice simplification if we don't need it :)

Thanks for the update. Could you point me to the branch or PR that addresses installing cpupower alongside the custom kernel?

I’d like to test that setup first and confirm CPU frequency control works with the custom dmabuf kernel. Once that’s verified, I’ll remove the CPU-freq skip/fallback path as a follow-up simplification.

karlowich · 2026-05-22T10:39:19Z

@yonggilsong, sorry for the slow reply. It is here: 8dcf598

fix(bench/cpu): tolerate missing turbo and cpupower controls

9140cc4

Allow benchmark runs to continue when turbo sysfs controls or cpupower are unavailable on the target system. Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

karlowich reviewed May 8, 2026

View reviewed changes

NaddiNadja requested changes May 8, 2026

View reviewed changes

yonggilsong added 4 commits May 14, 2026 06:56

test(bench/compare): add validation and smoke coverage

67afb9d

Signed-off-by: Yonggil Song <yonggil.song@samsung.com>

yonggilsong force-pushed the priv/yg/comp_bench branch from a563b4e to e185a26 Compare May 13, 2026 22:38

yonggilsong requested review from NaddiNadja and karlowich May 14, 2026 02:03

Conversation

yonggilsong commented May 5, 2026

Summary

Usage

TODO

Uh oh!

karlowich left a comment

Choose a reason for hiding this comment

Uh oh!

karlowich May 8, 2026

Choose a reason for hiding this comment

Uh oh!

yonggilsong May 10, 2026

Choose a reason for hiding this comment

Uh oh!

yonggilsong May 13, 2026

Choose a reason for hiding this comment

Uh oh!

karlowich May 14, 2026

Choose a reason for hiding this comment

Uh oh!

yonggilsong May 19, 2026

Choose a reason for hiding this comment

Uh oh!

NaddiNadja commented May 8, 2026

Uh oh!

NaddiNadja left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yonggilsong commented May 10, 2026

Uh oh!

karlowich commented May 14, 2026

Uh oh!

yonggilsong commented May 19, 2026

Uh oh!

karlowich commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants