Skip to content

Add fuzzer harness and fix bugs it found.#363

Merged
diegonehab merged 25 commits intomainfrom
feature/fuzz
Apr 9, 2026
Merged

Add fuzzer harness and fix bugs it found.#363
diegonehab merged 25 commits intomainfrom
feature/fuzz

Conversation

@diegonehab
Copy link
Copy Markdown
Contributor

@diegonehab diegonehab commented Mar 18, 2026

Summary

This branch introduces libFuzzer-based fuzz testing for the RISC-V interpreter and shadow state, implements lazy TLB verification, and fixes five bugs uncovered by fuzzing.

Commits

  • fix: ensure PC alignment invariant at startup — Raise MCAUSE_INSN_ADDRESS_MISALIGNED if PC has bit 0 set when entering the interpreter, rather than relying on the fetch logic to handle it.

  • fix: enforce WARL registers when reading — Centralize all WARL bit-masking into the i-state-access layer via riscv-warl.h, so external writes (C API, snapshots) can't store illegal bit patterns. Also corrects xtvec masking (& ~1& ~3).

  • fix: properly invalidate fetch cache — Use ~pc as the miss sentinel instead of TLB_INVALID_PAGE (which could produce false hits at the top of virtual memory). Add invalidation after fetch exceptions and privilege changes.

  • fix: assert_no_break with multiple interrupts — Only assert that delegated interrupts are zero in S/U-mode, since non-delegated M-mode interrupts can legitimately remain pending.

  • feat: regression tests for fuzzer bugs — Lua test suite (spec-fuzzer-bugs.lua) covering all four bugs above.

  • feat: add fuzzer support to step verification — New fuzz-step target that runs each fuzzed input through four independent execution paths (cm_run, cm_run_uarch, cycle-by-cycle uarch fraud proofs, page-based fraud proofs) and asserts all produce identical root hashes. Refactors fuzz input parsing into shared fuzz-common.h.

  • feat: lazy verification/heating of TLB slots — Replace eager TLB shadow validation at machine construction with lazy per-slot validation on first access. Hot entries start as TLB_UNVERIFIED_PAGE and are promoted on demand. Hardens replay against attacker-crafted step logs, adds PMA bounds verification, and guards do_read_pma against out-of-bounds indices.

  • feat: add shadow-state fuzzer — New fuzz-shadow-state target that writes the entire shadow state (registers + TLB) via cm_write_memory with hostile data, then runs the interpreter. Uses registers_state struct directly for corpus compatibility with fuzz-interpret. Crafts TLB entries targeting discovered PMAs (memory-backed, device, out-of-bounds) with correct slot placement for actual TLB hits. Adds FUZZ_FOCUS build variable to restrict coverage instrumentation to specific source files, and a fuzz-coverage Makefile target for llvm-cov HTML reports.

  • fix: default coverage generation to clang on macOSCOVERAGE_TOOLCHAIN in tests/Makefile now defaults to clang on Darwin, so coverage-report uses llvm-profdata/llvm-cov instead of gcov/gcovr.

  • fix: decouple iunrep from mutable shadow statepoll_external_interrupts and other runtime checks now use machine::is_unreproducible() (reads from immutable config) instead of the shadow register. WFI clamps mcycle_max to mcycle_end to prevent overshooting. Adds a consistency check between config and shadow iunrep on load, and a regression test.

  • feat: add persistent mode to fuzz targets (~44x faster) — Reuse a single machine across fuzz inputs instead of creating and destroying one per input. Each iteration zeros RAM and overwrites the full shadow state (registers + TLB) via a bulk cm_write_memory() call, which also reinitializes the hot TLB cache. The old per-input mode is still available via FUZZ_NO_PERSIST=1. Merges the shadow-state fuzzers into the interpret fuzzers (both now use shadow state bulk writes), unifies the corpus directory, generates the seed corpus as part of build-tests-machine, and adds comprehensive comments for newcomers.

Bugs Found and Fixed

1. Misaligned PC at startup

External APIs could set PC to an odd value before calling run(), violating the 2-byte alignment invariant the fetch logic depends on. Fixed by checking PC alignment at interpreter entry and raising MCAUSE_INSN_ADDRESS_MISALIGNED if bit 0 is set.

2. WARL registers not legalized through state access layer

WARL bit-masking was only applied inside CSR instruction handlers, so external writes (C API, snapshots) could store illegal bit patterns consumed raw by the interpreter. Fixed by centralizing all WARL legalization into the i-state-access layer via riscv-warl.h. Also corrected xtvec masking (& ~1& ~3).

3. Fetch cache incorrectly invalidated

The fetch cache used TLB_INVALID_PAGE as its miss sentinel, but the XOR-based hit test could produce false hits when PC was in the last page of virtual memory. The cache also wasn't invalidated after fetch exceptions or privilege changes. Fixed by using ~pc as the sentinel (guaranteed miss) and adding invalidation after exceptions and raise_interrupt_if_any.

4. Debug assertion too strict with multiple pending interrupts

assert_no_brk() required all pending interrupts to be zero after instruction execution. In S/U-mode, non-delegated M-mode interrupts can legitimately remain pending for the outer loop to handle. Fixed by only asserting that delegated interrupts are zero in S/U-mode.

5. iunrep read from mutable shadow state at runtime

poll_external_interrupts reads iunrep from the shadow state on every call, meaning a corrupted shadow state can flip the machine into unreproducible mode mid-execution. WFI then advances mcycle up to rtc_time_to_cycle(clint_mtimecmp), which can exceed mcycle_end, causing mcycle to overshoot silently in release builds (or triggering a debug assertion). Fixed by: (a) adding machine::is_unreproducible() that reads from the immutable config instead of shadow state, and using it everywhere iunrep was previously read at runtime; (b) clamping WFI's mcycle_max to mcycle_end so poll_external_interrupts can never advance mcycle past the requested limit; (c) adding a consistency check in validate_processor_shadow to ensure the shadow iunrep matches the config value when loading from disk.

@diegonehab diegonehab requested a review from edubart March 18, 2026 14:55
@diegonehab diegonehab force-pushed the feature/fuzz branch 12 times, most recently from b66f293 to 8270534 Compare March 23, 2026 17:33
@diegonehab diegonehab marked this pull request as draft March 24, 2026 16:05
@diegonehab diegonehab marked this pull request as ready for review March 24, 2026 18:59
@diegonehab diegonehab requested a review from mpernambuco March 24, 2026 19:00
@edubart edubart moved this from Todo to Waiting Review in Machine Unit Mar 31, 2026
@edubart edubart added this to the v0.20.0 milestone Mar 31, 2026
Copy link
Copy Markdown
Collaborator

@edubart edubart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have compiled and run locally, tested, also run fuzz tests, also read and reasoned about most of the changes. Not much to change besides more tests covering what was introduced (optional) to increase my confidence in the TLB details across all state accessors (which is fragile).

Also run some basic benchmarks here and there was no significant impact, although it was not a comprehensive.

I think at least the build issues I had with tests/Makefile‎ should be fixed, otherwise it breaks my development workflow here, see comments.

Comment thread src/replay-step-state-access.h Outdated
Comment thread src/replay-step-state-access.h
Comment thread tests/Makefile
Comment thread Makefile
slot_paddr, SHADOW_TLB_SLOT_LOG2_SIZE,
[this, set_index, slot_index, vaddr_page, vp_offset, pma_index]() {
m_m.write_shadow_tlb(set_index, slot_index, vaddr_page, vp_offset, pma_index);
m_m.write_unverified_tlb(set_index, slot_index, vaddr_page, vp_offset, pma_index);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to coverage report downloaded from CI, we have no coverage for this line or do_write_tlb functions for:

  • record_step_state_access
  • replay_step_state_access
  • uarch_record_state_access
  • uarch_replay_state_access

I would feel more confident if we had tests run by the CI that covered them (outside the fuzzer).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this summary helps:

Coverage improvement summary

Starting point

The CI coverage job was missing the test-machine-with-log-step target, which runs
cartesi-machine-tests.lua with the run_step command. This meant log_step() /
verify_step() were never exercised during coverage, leaving the entire
replay_step_state_access path (including do_write_tlb) at 0% for the big-machine
replay.

Adding test-machine-with-log-step to the CI coverage job brought
replay-step-state-access.h from 68.8% to 87.5% and functions from 60.2% to 81.9%
overall.

Identifying the next gaps

With the happy-path replay now covered, the remaining gaps were:

File Before After adding log_step
replay-send-cmio-state-access.h 62.4% 62.4% (unchanged)
uarch-replay-state-access.h 84.0% 84.0% (unchanged)

The cmio replay verifier had zero coverage of its ~27 validation error paths. The uarch
replay had no coverage of TLB write verification. These are the paths that reject
invalid state transitions during fraud proof disputes.

What changed vs the original plan

The initial analysis proposed writing tests inspired by the old cheat/simple/
tournament code, which wrapped machines to produce dishonest state transitions. That
approach would have required building a valid machine abstraction and was more complex
than necessary.

Feedback during the session redirected the approach in several ways:

  1. No need for a dishonest machine wrapper. Instead of corrupting the machine and
    re-running, we produce a valid log and then corrupt the log itself before calling
    verify. This directly tests what matters: that the verifier rejects bad logs.

  2. Fix the JSON deserialization layer. The Lua-to-C++ serialization in
    json-util.cpp was silently validating and filtering access log data before the C++
    replay code could see it. Specifically:

    • The written field was dropped for read accesses, making the "unexpected written
      data in read access" error unreachable.
    • Size checks on read, written, and sibling_hashes data threw JSON-layer
      errors before the C++ replay validation could run.
    • A bug in the error message reported "written" when checking read data size.

    The fix was to remove all validation from the JSON layer and let the C++ replay code
    handle it. The JSON layer should faithfully deserialize; validation is the replay
    code's job, since that is the code that mirrors what runs on-chain.

  3. The "differs only by written word" error was reachable. Initially dismissed as
    unreachable because HASH_TREE_WORD_SIZE == sizeof(uint64_t). But the hash tree
    word size is actually 32 bytes, not 8. Register writes modify an 8-byte word within
    a 32-byte leaf. Corrupting a different 8-byte region in the leaf while preserving
    the written word triggers this check.

  4. Truncation tests at different points. The initial test only truncated the last
    access. Different truncation points hit different "too few accesses" checks in
    check_read, check_write, and do_write_memory_with_padding.

  5. Make posix.stdlib optional in test utilities. The test utility required
    posix.stdlib for realpath, which prevented running tests on macOS without the
    posix Lua module. The fix uses pcall and falls back to relative paths.

  6. Use the lester test framework. The test was initially standalone with its own
    harness. It was converted to use the project's lester framework and integrated
    into test-spec.lua, so it runs as part of the existing test-lua CI target.

verify_step failure tests (big-machine replay)

The big-machine replay verifier (replay_step_state_access) uses a page-based binary
step log format, unlike the uarch verifier which uses per-access JSON logs. Testing it
required a completely different approach: producing a valid binary log via log_step,
then surgically corrupting it before calling verify_step.

Feedback steered the design:

  1. Binary log corruption, not JSON corruption. Claude initially suggested reusing the
    uarch test pattern of corrupting JSON access logs. But verify_step takes a binary
    step log file, not a JSON access log. The test had to understand the binary format
    (root hashes, mcycle, hash function type, page entries with index/data/scratch-hash,
    sibling hashes) and corrupt specific fields.

  2. Build a log from scratch for structural tests. For tests like "too few sibling
    hashes at leaf level" or "too many pages", corrupting a valid log was insufficient.
    The test needed build_step_log and get_siblings_for_pages helpers to construct
    logs with precise page/sibling combinations.

  3. Page corruption must fool the root hash check. Claude initially proposed simply
    removing pages from a valid log. But verify_step computes the root hash from the
    logged pages and siblings before replaying -- a missing page would fail the root hash
    check, not the replay. To test "required page not found" during replay, the test had
    to recompute the root hash for the reduced page set, which required computing fresh
    sibling hashes from the machine's Merkle tree.

  4. Corrupt page data, not page presence, for replay errors. To trigger interpreter
    errors during replay (e.g., exception on a corrupted instruction), the test corrupts
    the page data while keeping the page in the log and recomputing the page's hash to
    match, then recomputing the root hash with fresh siblings.

Fixing record/replay TLB validation

The lazy TLB verification (commit 3ac97eb) introduced do_init_hot_tlb_slot which
validates shadow TLB entries on first access. But record_step_state_access had a bug:
it touched the PMA page only for valid entries, meaning replay could not validate corrupt
entries (it would fail trying to read PMA data that wasn't in the log).

Feedback identified this:

  1. Record must touch the PMA page unconditionally. Claude initially thought the
    existing code was already correct. But the record side was only touching the PMA page
    inside the if (vaddr_page != TLB_INVALID_PAGE) block. Corrupt TLB entries (garbage
    pma_index) still need the PMA page in the log so replay can call do_read_pma and
    detect the corruption.

  2. Validate first, then touch the target page. The old code touched the target page
    before validation. The fix reorders: validate via init_hot_tlb_slot, then touch the
    target page only if validation passed.

  3. Clamp out-of-bounds pma_index. pmas_get_abs_addr could compute an address
    outside the PMAs region for a corrupt pma_index. The fix clamps to a sentinel entry
    at PMA_MAX, which has zeroed istart/ilength and fails is_memory(), causing
    shadow_tlb_verify_slot to reject the entry.

  4. TLB thrash test. A dedicated RISC-V test binary (thrash-tlb.S) exercises all
    TLB sets by reading/writing addresses that map to every TLB slot, forcing evictions
    and re-validations. Pre/post Lua scripts set up corrupt shadow TLB entries and verify
    the machine handles them correctly during log_step/verify_step.

Marking genuinely unreachable code

During coverage analysis of replay_step_state_access, several code paths were
identified as genuinely unreachable despite being defensive checks. Each was marked with
LCOV_EXCL_START/LCOV_EXCL_STOP and a comment explaining why.

Claude initially proposed writing tests to cover some of these. Analysis during the
session proved they were unreachable:

  1. Leaf-level "too few sibling hashes" (line 391). To reach this code, we would
    need pages[next_page].index < page_index at leaf level. Claude initially tried to
    construct an adversarial log to hit this path. After tracing the recursion in
    compute_root_hash_impl, the proof emerged: pages are sorted and consumed
    left-to-right, so the next unconsumed page always has index >= the current leaf's
    page_index.

  2. find_page(host_addr) "required page not found" (line 324). The only caller is
    do_write_tlb, which receives vh_offset from the interpreter's page walk.
    vh_offset is computed from do_get_faddr, which already called
    find_page(uint64_t) successfully for the same page. Claude initially proposed
    removing the host_addr overloads as dead code left over from the pre-lazy-TLB bulk
    relocation (commit 3ac97eb). But do_write_tlb at line 562 does call
    find_page(host_addr) to reverse-translate vh_offset back to a physical address --
    the overloads are needed, just their error path is unreachable.

  3. do_read_memory / do_write_memory / do_putchar. These are required by the
    i_state_access CRTP interface but never called during step replay.

  4. Page data ordering check (line 248). Pages are stored contiguously in the parsed
    log, so their data addresses are always in increasing order by construction. The
    check guards against a hypothetical future where page data is independently
    allocated. This was already marked with LCOV_EXCL using LCOV_EXCL_END instead
    of LCOV_EXCL_STOP -- gcovr requires LCOV_EXCL_STOP.

Changes made

src/json-util.cpp

  • Removed log2_size >= 64 bounds check from access log deserialization.
  • Removed read/written data size validation.
  • Removed sibling_hashes depth validation.
  • Removed the type == write guard on deserializing the written field.
  • Fixed error message that said "written" when checking read data size.

src/pmas.h

  • pmas_get_abs_addr clamps out-of-bounds pma_index to a sentinel entry at PMA_MAX.
  • Added static_assert that there is room for the sentinel entry.

src/machine-address-ranges.cpp

  • Added bounds check on PMA count in push_back.

src/machine.cpp

  • Added static_assert and runtime check for PMA count in init_pmas_contents.

src/record-step-state-access.h

  • do_init_hot_tlb_slot: touch PMA page unconditionally (before validation), validate
    first via init_hot_tlb_slot, then touch target page only if valid.

src/replay-step-state-access.h

  • Added LCOV_EXCL_START/LCOV_EXCL_STOP with explanatory comments for five
    unreachable code paths: page data ordering check, find_page(host_addr) error,
    leaf-level sibling check, do_read_memory/do_write_memory, and do_putchar.
  • Fixed LCOV_EXCL_END -> LCOV_EXCL_STOP (gcovr requires STOP, not END).
  • do_write_tlb: write zero_padding field to shadow TLB.

src/replay-send-cmio-state-access.h

  • Added LCOV_EXCL markers for six lines that are genuinely unreachable through the
    current API (aligned address checks, empty-log read check, null data check, address
    mismatch in check_read).
  • Fixed error message capitalization.

src/clua-cartesi.cpp, src/machine-c-api.cpp, src/machine-c-api.h

  • Exposed AR_SHADOW_STATE_START, AR_SHADOW_STATE_LENGTH, AR_PMAS_START,
    AR_PMAS_LENGTH constants through Lua and C APIs for use by TLB validation tests.

tests/lua/cartesi/tests/util.lua

  • Made posix.stdlib optional via pcall. Falls back to relative paths.

tests/lua/machine-bind.lua

  • Updated error expectations in verify_reset_uarch and verify_step_uarch unhappy
    path tests to match the C++ replay errors instead of the removed JSON-layer errors.

tests/lua/spec-verify-uarch-failure.lua (new)

67 tests exercising every reachable validation error path:

  • verify_step_uarch (30 tests): basic step corruptions (empty log, extra access,
    wrong type/address/log2_size, corrupt data/hashes/siblings, ordinal formatting for
    1st-4th accesses, wrong final hash) and TLB write corruptions via the
    ecall-write-tlb test binary (wrong type/address, corrupt siblings, wrong/missing
    written_hash, corrupt read/written data).
  • verify_send_cmio_response (37 tests): check_read errors (7 tests),
    do_write_memory_with_padding errors (8 tests), check_write errors (12 tests),
    log structure errors (5 tests including truncation at three different points),
    ordinal coverage, zero-length data path, and wrong final hash.

tests/lua/spec-verify-step-failure.lua (new)

24 tests exercising the binary step log verifier:

  • Log parsing errors (9 tests): truncation at each field boundary (root hash before,
    mcycle count, root hash after, hash function type, page count, sibling count, sibling
    hashes), extra trailing data, and unsupported hash function type.
  • Page validation errors (4 tests): out-of-order page indices, non-zero scratch hash,
    extra pages beyond the log's page count, and too many pages in the Merkle tree
    reconstruction.
  • Sibling hash errors (3 tests): too few siblings at internal level, too few siblings
    at leaf level, and too many siblings.
  • Root hash / replay errors (5 tests): initial root hash mismatch, wrong mcycle
    count, wrong root hash after, missing page during replay (requiring recomputation of
    sibling hashes for a reduced page set), and corrupt page data causing an interpreter
    exception during replay.
  • Hash function coverage (3 tests): all tests run with both SHA-256 and Keccak-256,
    plus an explicit test for unsupported hash function type.

tests/lua/pre-thrash-tlb.lua, tests/lua/post-thrash-tlb.lua (new)

Setup and verification scripts for the TLB thrash test, which exercises log_step /
verify_step with corrupt shadow TLB entries across all TLB sets.

tests/machine/src/thrash-tlb.S (new)

RISC-V test binary that reads and writes addresses mapping to every TLB slot, forcing
evictions and re-validations.

tests/lua/test-spec.lua

  • Added require("spec-verify-step-failure") and require("spec-verify-uarch-failure")
    so both tests run as part of test-lua.

.github/workflows/build.yml

  • Added test-machine-with-log-step to the coverage CI job.

tests/Makefile

  • Fixed build-tests-all to work inside the container without forcing fuzzer seed corpus
    generation.

Coverage results

Overall (coverage -> coverage2):

Metric Before After Delta
Lines 18578/23339 (79.6%) 18827/23312 (80.7%) +249, +1.1%
Functions 3827/6361 (60.2%) 5227/6358 (82.2%) +1400, +22.0%
Branches 8985/20588 (43.6%) 9377/20506 (45.7%) +392, +2.1%

Critical verification files:

File Before After
replay-step-state-access.h 68.8% 100.0%
replay-send-cmio-state-access.h 62.4% 97.8%
uarch-replay-state-access.h 84.0% 99.4%
uarch-record-state-access.h 85.4% 94.2%
record-step-state-access.h 53.7% 82.9%
shadow-tlb.h 40.4% 55.3%
shadow-uarch-state.h 31.3% 41.0%
pmas.h 60.3% 64.9%
address-range.h 76.9% 77.8%
machine.cpp 86.9% 87.9%

Remaining uncovered lines

In replay-send-cmio-state-access.h (4 uncovered, all marked LCOV_EXCL):

  • "address not aligned to word size" -- register addresses are always 8-byte aligned.
  • "too few accesses in log" in check_read -- the constructor catches empty logs
    first, and the only read (iflags.Y) is always first.
  • Address mismatch in check_read -- only one read address is used (iflags.Y).

In uarch-replay-state-access.h (1 uncovered):

  • return "unknown_" in access_type_name -- only read and write exist.

In replay-step-state-access.h (0 uncovered after LCOV_EXCL markers):

  • All remaining uncovered paths are marked with LCOV_EXCL and documented with proofs
    of unreachability (see "Marking genuinely unreachable code" above).

Many not-taken branches in the coverage report are GCC's exception-handling machinery
(implicit branches for std::string allocation failure inside throw expressions),
not real logic branches.

Comment thread src/interpret.cpp
// In contrast, a STATE_ACCESS that does not have access to hot out-of-state slots cannot mark TLB slots
// as not-yet-initialized.
// We must verify the cold slot at every hit and treat inconsistent entries as misses
if (!a.template verify_cold_tlb_slot<TLB_READ>(slot_index)) [[unlikely]] {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to coverage report downloaded from CI, we have no coverage the case when verify_cold_tlb_slot fails (for state accessor that implements them) and falls inside this if for:

  • fetch_translate_pc
  • read_virtual_memory
  • write_virtual_memory

I would feel more confident if we had tests run by the CI that covered them (outside the fuzzer).
Could be a simple test that thrashes the TLB on purpose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only state access that implements verify_cold_tlb_slot (other than simply returning true) is the one the uarch bridge, used to compile interpret to uarch.bin so it can run inside the uarch. So unless we add a way to extract coverage from interpret while it is running inside uarch, the coverage will not show...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I will try that. :)

Copy link
Copy Markdown
Contributor Author

@diegonehab diegonehab Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uarch coverage collection

Motivation

The emulator's interpret() function is compiled twice: once for the host
(with gcov instrumentation), and once as a RISC-V binary that runs inside the
microarchitecture emulator (uarch-ram.bin). The host coverage report misses
code paths that are only exercised inside the uarch -- most notably
machine-uarch-bridge-state-access.h (which is never compiled into the host)
and the failure branch of verify_cold_tlb_slot() in interpret.cpp (which
requires the bridge state access to trigger).

How it works

1. Separate coverage uarch binary

The production uarch-ram.bin is compiled with -O2 -g0 for performance.
For coverage, a separate uarch-ram-coverage.bin is built alongside it
(when coverage=yes) with -O0 -g -DCODE_COVERAGE. This gives:

  • Full debug info for accurate addr2line PC-to-source mapping
  • No inlining (CODE_COVERAGE disables FORCE_INLINE, which otherwise
    uses __attribute__((always_inline)) and defeats -fno-inline)
  • No --gc-sections (which can strip debug sections)

Both binaries are built from the same source using separate object files
(.cov_cpp.o / .cov_c.o suffixes) so they don't interfere.

The production binary is used for all normal tests. The coverage binary is
only loaded for the run_uarch_coverage tests via --uarch-ram-image.

2. PC collection during test runs

The run_uarch_coverage command in cartesi-machine-tests.lua runs tests
through the uarch interpreter one cycle at a time, reading uarch_pc before
each cycle and collecting unique PCs into a Lua table. After each test, the
PCs are written to a .pcs file (one hex address per line) in the directory
specified by --uarch-pc-output-dir.

The test-coverage-uarch-pcs Makefile target runs the csr and thrash-tlb
tests in this mode. This target is separate from test-coverage-uarch
(which runs the validation tests without PC collection) so that non-coverage
CI jobs (e.g. sanitize) don't need the coverage binary.

Tests with pre/post scripts (like the thrash-tlb corruption test) get a hash
suffix in the .pcs filename to avoid collisions with the plain version.

3. Resolving PCs to source lines

The tests/scripts/uarch-pcs-to-gcov.lua script resolves the collected PCs
to source file, function name, and line number using addr2line -f against
uarch/uarch-ram-coverage.elf.

The script handles DWARF path resolution in three cases:

  • Direct match: DWARF paths match the local gcov_dir prefix (e.g.
    /usr/src/emulator/src/interpret.cpp on CI). The prefix is stripped to
    get the bare filename.

  • Project root match: DWARF paths are under the project root but outside
    gcov_dir (e.g. /usr/src/emulator/uarch/machine-uarch-bridge-state-access.h).
    The project root is computed from gcov_dir and the path is made relative
    (e.g. ../uarch/machine-uarch-bridge-state-access.h).

  • Basename fallback: DWARF paths don't match the local tree at all (e.g.
    the ELF was built inside Docker with paths like /opt/cartesi/...). The
    script extracts the basename and checks if the file exists under uarch/
    or src/ in the local tree.

Paths outside the project (e.g. C++ standard library headers) are filtered
out. If addr2line is not available, the script exits gracefully and the
report is generated without uarch data.

4. Running gcov with proper merging

The tests/scripts/run-gcov.lua script runs gcov (or llvm-cov gcov)
on each .gcda file individually and merges the resulting .gcov files.

This works around a bug in llvm-cov gcov: when processing multiple .gcda
files that share headers, it overwrites the .gcov file for each shared
header rather than accumulating counts. GNU gcov merges correctly but the
script works with both.

The merge adds execution counts from all versions of each source line, and
prefers ##### (uncovered but executable) over - (non-executable) for
lines that appear in only some compilation units.

5. Merging uarch coverage into .gcov files

After run-gcov.lua produces the host .gcov files, uarch-pcs-to-gcov.lua
modifies them before gcovr reads them:

  • Existing .gcov files (e.g. interpret.cpp.gcov): lines marked as
    uncovered (#####) that were hit by the uarch get their count replaced.
    Lines already marked as executed by the host get the uarch count added.

  • New .gcov files (e.g. for machine-uarch-bridge-state-access.h):
    created from scratch with function records (required by gcovr to recognize
    executable lines) and line hit counts. Non-hit lines are marked as
    non-executable (-) since there is no way to determine which lines the
    compiler considers executable without gcov instrumentation data.

6. Generating the report

gcovr --use-gcov-files reads all .gcov files from src/ and produces
the HTML report and text summary. The --filter flags include both src/
and uarch/ directories to pick up the bridge header and other uarch-only
source files.

On systems without the RISC-V toolchain, the uarch-pcs-to-gcov.lua script
runs inside the toolchain Docker container (which has
riscv64-unknown-elf-addr2line). The gcov and gcovr steps run on the host.

7. Unified coverage toolchain

On macOS, clang coverage now uses --coverage (gcc-compatible .gcno/.gcda
format) instead of -fprofile-instr-generate -fcoverage-mapping. This means
the same gcov/gcovr pipeline works on both Linux (gcc) and macOS (clang),
and LCOV_EXCL_START/LCOV_EXCL_STOP markers are respected on both
platforms. The COVERAGE_TOOLCHAIN variable is exported from tests/Makefile
so sub-makes inherit the correct value.

Limitations

  • For source files that exist only in the uarch binary (like the bridge
    header), all non-hit lines appear as non-executable in the report. This
    means the report shows which lines were executed, but cannot show which
    lines should have been executed but were not.

  • Even with -O0, the coverage binary is a different compilation from the
    host. Template instantiations may differ, so some lines in shared headers
    might not be attributed identically.

Running locally

From a clean checkout:

make submodules
make -j$(nproc) coverage=yes
make -C tests build-tests-machine-with-toolchain coverage=yes
make -C tests build-tests-misc coverage=yes
make -C tests build-tests-uarch-with-toolchain coverage=yes
make -C tests build-tests-images coverage=yes
eval $(make env)
make -C tests -j1 coverage=yes \
    test-save-and-load \
    test-machine \
    test-lua \
    test-c-api \
    test-coverage-machine \
    test-uarch-rv64ui \
    test-uarch-interpreter \
    test-coverage-uarch \
    test-coverage-uarch-pcs \
    test-machine-with-log-step
make -C tests coverage-report coverage=yes
# Report at tests/build/coverage/gcc/index.html

To regenerate just the report (after tests have already run):

make -C tests coverage-report coverage=yes

Files

  • uarch/Makefile -- builds both uarch-ram.bin (production) and
    uarch-ram-coverage.bin (with -O0 -g -DCODE_COVERAGE) when coverage=yes
  • tests/lua/cartesi-machine-tests.lua -- run_uarch_coverage command and
    --uarch-pc-output-dir / --uarch-ram-image options
  • tests/scripts/run-gcov.lua -- runs gcov per .gcda with proper merging
  • tests/scripts/uarch-pcs-to-gcov.lua -- resolves PCs and merges into
    .gcov files
  • tests/scripts/generate-coverage-report.sh -- standalone script for
    running the full coverage pipeline
  • tests/Makefile -- test-coverage-uarch (validation tests),
    test-coverage-uarch-pcs (PC collection), coverage-report (report
    generation)
  • .github/workflows/build.yml -- CI coverage job runs both targets

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 💯

Comment thread src/interpret.cpp Outdated
Comment thread src/riscv-warl.h
@github-project-automation github-project-automation Bot moved this from Waiting Review to In Progress in Machine Unit Mar 31, 2026
Instead of using TLB_INVALID_PAGE, the correct invalidation is ~pc.
This ensures the xor trick in fetch_insn doesn't fail when pc is in
the last page of virtual memory.
@diegonehab diegonehab force-pushed the feature/fuzz branch 18 times, most recently from be7b44f to 1558faa Compare April 8, 2026 19:39
@diegonehab diegonehab requested a review from edubart April 8, 2026 20:20
@github-project-automation github-project-automation Bot moved this from In Progress to Waiting Merge in Machine Unit Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@edubart edubart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is good enough already, good work!

@diegonehab diegonehab merged commit 384ec21 into main Apr 9, 2026
9 checks passed
@diegonehab diegonehab deleted the feature/fuzz branch April 9, 2026 08:17
@github-project-automation github-project-automation Bot moved this from Waiting Merge to Done in Machine Unit Apr 9, 2026
@edubart edubart mentioned this pull request Apr 9, 2026
@edubart edubart added the enhancement New feature or request label Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants