Add fuzzer harness and fix bugs it found. by diegonehab · Pull Request #363 · cartesi/machine-emulator

diegonehab · 2026-03-18T14:55:09Z

Summary

This branch introduces libFuzzer-based fuzz testing for the RISC-V interpreter and shadow state, implements lazy TLB verification, and fixes five bugs uncovered by fuzzing.

Commits

fix: ensure PC alignment invariant at startup — Raise MCAUSE_INSN_ADDRESS_MISALIGNED if PC has bit 0 set when entering the interpreter, rather than relying on the fetch logic to handle it.
fix: enforce WARL registers when reading — Centralize all WARL bit-masking into the i-state-access layer via riscv-warl.h, so external writes (C API, snapshots) can't store illegal bit patterns. Also corrects xtvec masking (& ~1 → & ~3).
fix: properly invalidate fetch cache — Use ~pc as the miss sentinel instead of TLB_INVALID_PAGE (which could produce false hits at the top of virtual memory). Add invalidation after fetch exceptions and privilege changes.
fix: assert_no_break with multiple interrupts — Only assert that delegated interrupts are zero in S/U-mode, since non-delegated M-mode interrupts can legitimately remain pending.
feat: regression tests for fuzzer bugs — Lua test suite (spec-fuzzer-bugs.lua) covering all four bugs above.
feat: add fuzzer support to step verification — New fuzz-step target that runs each fuzzed input through four independent execution paths (cm_run, cm_run_uarch, cycle-by-cycle uarch fraud proofs, page-based fraud proofs) and asserts all produce identical root hashes. Refactors fuzz input parsing into shared fuzz-common.h.
feat: lazy verification/heating of TLB slots — Replace eager TLB shadow validation at machine construction with lazy per-slot validation on first access. Hot entries start as TLB_UNVERIFIED_PAGE and are promoted on demand. Hardens replay against attacker-crafted step logs, adds PMA bounds verification, and guards do_read_pma against out-of-bounds indices.
feat: add shadow-state fuzzer — New fuzz-shadow-state target that writes the entire shadow state (registers + TLB) via cm_write_memory with hostile data, then runs the interpreter. Uses registers_state struct directly for corpus compatibility with fuzz-interpret. Crafts TLB entries targeting discovered PMAs (memory-backed, device, out-of-bounds) with correct slot placement for actual TLB hits. Adds FUZZ_FOCUS build variable to restrict coverage instrumentation to specific source files, and a fuzz-coverage Makefile target for llvm-cov HTML reports.
fix: default coverage generation to clang on macOS — COVERAGE_TOOLCHAIN in tests/Makefile now defaults to clang on Darwin, so coverage-report uses llvm-profdata/llvm-cov instead of gcov/gcovr.
fix: decouple iunrep from mutable shadow state — poll_external_interrupts and other runtime checks now use machine::is_unreproducible() (reads from immutable config) instead of the shadow register. WFI clamps mcycle_max to mcycle_end to prevent overshooting. Adds a consistency check between config and shadow iunrep on load, and a regression test.
feat: add persistent mode to fuzz targets (~44x faster) — Reuse a single machine across fuzz inputs instead of creating and destroying one per input. Each iteration zeros RAM and overwrites the full shadow state (registers + TLB) via a bulk cm_write_memory() call, which also reinitializes the hot TLB cache. The old per-input mode is still available via FUZZ_NO_PERSIST=1. Merges the shadow-state fuzzers into the interpret fuzzers (both now use shadow state bulk writes), unifies the corpus directory, generates the seed corpus as part of build-tests-machine, and adds comprehensive comments for newcomers.

Bugs Found and Fixed

1. Misaligned PC at startup

External APIs could set PC to an odd value before calling run(), violating the 2-byte alignment invariant the fetch logic depends on. Fixed by checking PC alignment at interpreter entry and raising MCAUSE_INSN_ADDRESS_MISALIGNED if bit 0 is set.

2. WARL registers not legalized through state access layer

WARL bit-masking was only applied inside CSR instruction handlers, so external writes (C API, snapshots) could store illegal bit patterns consumed raw by the interpreter. Fixed by centralizing all WARL legalization into the i-state-access layer via riscv-warl.h. Also corrected xtvec masking (& ~1 → & ~3).

3. Fetch cache incorrectly invalidated

The fetch cache used TLB_INVALID_PAGE as its miss sentinel, but the XOR-based hit test could produce false hits when PC was in the last page of virtual memory. The cache also wasn't invalidated after fetch exceptions or privilege changes. Fixed by using ~pc as the sentinel (guaranteed miss) and adding invalidation after exceptions and raise_interrupt_if_any.

4. Debug assertion too strict with multiple pending interrupts

assert_no_brk() required all pending interrupts to be zero after instruction execution. In S/U-mode, non-delegated M-mode interrupts can legitimately remain pending for the outer loop to handle. Fixed by only asserting that delegated interrupts are zero in S/U-mode.

5. `iunrep` read from mutable shadow state at runtime

poll_external_interrupts reads iunrep from the shadow state on every call, meaning a corrupted shadow state can flip the machine into unreproducible mode mid-execution. WFI then advances mcycle up to rtc_time_to_cycle(clint_mtimecmp), which can exceed mcycle_end, causing mcycle to overshoot silently in release builds (or triggering a debug assertion). Fixed by: (a) adding machine::is_unreproducible() that reads from the immutable config instead of shadow state, and using it everywhere iunrep was previously read at runtime; (b) clamping WFI's mcycle_max to mcycle_end so poll_external_interrupts can never advance mcycle past the requested limit; (c) adding a consistency check in validate_processor_shadow to ensure the shadow iunrep matches the config value when loading from disk.

edubart

I have compiled and run locally, tested, also run fuzz tests, also read and reasoned about most of the changes. Not much to change besides more tests covering what was introduced (optional) to increase my confidence in the TLB details across all state accessors (which is fragile).

Also run some basic benchmarks here and there was no significant impact, although it was not a comprehensive.

I think at least the build issues I had with tests/Makefile‎ should be fixed, otherwise it breaks my development workflow here, see comments.

edubart · 2026-03-31T20:14:50Z

            slot_paddr, SHADOW_TLB_SLOT_LOG2_SIZE,
            [this, set_index, slot_index, vaddr_page, vp_offset, pma_index]() {
-                m_m.write_shadow_tlb(set_index, slot_index, vaddr_page, vp_offset, pma_index);
+                m_m.write_unverified_tlb(set_index, slot_index, vaddr_page, vp_offset, pma_index);


According to coverage report downloaded from CI, we have no coverage for this line or do_write_tlb functions for:

record_step_state_access

replay_step_state_access

uarch_record_state_access

uarch_replay_state_access

I would feel more confident if we had tests run by the CI that covered them (outside the fuzzer).

Let me know if this summary helps:

Coverage improvement summary

Starting point

The CI coverage job was missing the test-machine-with-log-step target, which runs
cartesi-machine-tests.lua with the run_step command. This meant log_step() /
verify_step() were never exercised during coverage, leaving the entire
replay_step_state_access path (including do_write_tlb) at 0% for the big-machine
replay.

Adding test-machine-with-log-step to the CI coverage job brought
replay-step-state-access.h from 68.8% to 87.5% and functions from 60.2% to 81.9%
overall.

Identifying the next gaps

With the happy-path replay now covered, the remaining gaps were:

File Before After adding log_step

replay-send-cmio-state-access.h 62.4% 62.4% (unchanged)

uarch-replay-state-access.h 84.0% 84.0% (unchanged)

The cmio replay verifier had zero coverage of its ~27 validation error paths. The uarch
replay had no coverage of TLB write verification. These are the paths that reject
invalid state transitions during fraud proof disputes.

What changed vs the original plan

The initial analysis proposed writing tests inspired by the old cheat/simple/
tournament code, which wrapped machines to produce dishonest state transitions. That
approach would have required building a valid machine abstraction and was more complex
than necessary.

Feedback during the session redirected the approach in several ways:

No need for a dishonest machine wrapper. Instead of corrupting the machine and
re-running, we produce a valid log and then corrupt the log itself before calling
verify. This directly tests what matters: that the verifier rejects bad logs.

Fix the JSON deserialization layer. The Lua-to-C++ serialization in
json-util.cpp was silently validating and filtering access log data before the C++
replay code could see it. Specifically:

The written field was dropped for read accesses, making the "unexpected written
data in read access" error unreachable.

Size checks on read, written, and sibling_hashes data threw JSON-layer
errors before the C++ replay validation could run.

A bug in the error message reported "written" when checking read data size.

The fix was to remove all validation from the JSON layer and let the C++ replay code
handle it. The JSON layer should faithfully deserialize; validation is the replay
code's job, since that is the code that mirrors what runs on-chain.

The "differs only by written word" error was reachable. Initially dismissed as
unreachable because HASH_TREE_WORD_SIZE == sizeof(uint64_t). But the hash tree
word size is actually 32 bytes, not 8. Register writes modify an 8-byte word within
a 32-byte leaf. Corrupting a different 8-byte region in the leaf while preserving
the written word triggers this check.

Truncation tests at different points. The initial test only truncated the last
access. Different truncation points hit different "too few accesses" checks in
check_read, check_write, and do_write_memory_with_padding.

Make posix.stdlib optional in test utilities. The test utility required
posix.stdlib for realpath, which prevented running tests on macOS without the
posix Lua module. The fix uses pcall and falls back to relative paths.

Use the lester test framework. The test was initially standalone with its own
harness. It was converted to use the project's lester framework and integrated
into test-spec.lua, so it runs as part of the existing test-lua CI target.

verify_step failure tests (big-machine replay)

The big-machine replay verifier (replay_step_state_access) uses a page-based binary
step log format, unlike the uarch verifier which uses per-access JSON logs. Testing it
required a completely different approach: producing a valid binary log via log_step,
then surgically corrupting it before calling verify_step.

Feedback steered the design:

Binary log corruption, not JSON corruption. Claude initially suggested reusing the
uarch test pattern of corrupting JSON access logs. But verify_step takes a binary
step log file, not a JSON access log. The test had to understand the binary format
(root hashes, mcycle, hash function type, page entries with index/data/scratch-hash,
sibling hashes) and corrupt specific fields.

Build a log from scratch for structural tests. For tests like "too few sibling
hashes at leaf level" or "too many pages", corrupting a valid log was insufficient.
The test needed build_step_log and get_siblings_for_pages helpers to construct
logs with precise page/sibling combinations.

Page corruption must fool the root hash check. Claude initially proposed simply
removing pages from a valid log. But verify_step computes the root hash from the
logged pages and siblings before replaying -- a missing page would fail the root hash
check, not the replay. To test "required page not found" during replay, the test had
to recompute the root hash for the reduced page set, which required computing fresh
sibling hashes from the machine's Merkle tree.

Corrupt page data, not page presence, for replay errors. To trigger interpreter
errors during replay (e.g., exception on a corrupted instruction), the test corrupts
the page data while keeping the page in the log and recomputing the page's hash to
match, then recomputing the root hash with fresh siblings.

Fixing record/replay TLB validation

The lazy TLB verification (commit 3ac97eb) introduced do_init_hot_tlb_slot which
validates shadow TLB entries on first access. But record_step_state_access had a bug:
it touched the PMA page only for valid entries, meaning replay could not validate corrupt
entries (it would fail trying to read PMA data that wasn't in the log).

Feedback identified this:

Record must touch the PMA page unconditionally. Claude initially thought the
existing code was already correct. But the record side was only touching the PMA page
inside the if (vaddr_page != TLB_INVALID_PAGE) block. Corrupt TLB entries (garbage
pma_index) still need the PMA page in the log so replay can call do_read_pma and
detect the corruption.

Validate first, then touch the target page. The old code touched the target page
before validation. The fix reorders: validate via init_hot_tlb_slot, then touch the
target page only if validation passed.

Clamp out-of-bounds pma_index. pmas_get_abs_addr could compute an address
outside the PMAs region for a corrupt pma_index. The fix clamps to a sentinel entry
at PMA_MAX, which has zeroed istart/ilength and fails is_memory(), causing
shadow_tlb_verify_slot to reject the entry.

TLB thrash test. A dedicated RISC-V test binary (thrash-tlb.S) exercises all
TLB sets by reading/writing addresses that map to every TLB slot, forcing evictions
and re-validations. Pre/post Lua scripts set up corrupt shadow TLB entries and verify
the machine handles them correctly during log_step/verify_step.

Marking genuinely unreachable code

During coverage analysis of replay_step_state_access, several code paths were
identified as genuinely unreachable despite being defensive checks. Each was marked with
LCOV_EXCL_START/LCOV_EXCL_STOP and a comment explaining why.

Claude initially proposed writing tests to cover some of these. Analysis during the
session proved they were unreachable:

Leaf-level "too few sibling hashes" (line 391). To reach this code, we would
need pages[next_page].index < page_index at leaf level. Claude initially tried to
construct an adversarial log to hit this path. After tracing the recursion in
compute_root_hash_impl, the proof emerged: pages are sorted and consumed
left-to-right, so the next unconsumed page always has index >= the current leaf's
page_index.

find_page(host_addr) "required page not found" (line 324). The only caller is
do_write_tlb, which receives vh_offset from the interpreter's page walk.
vh_offset is computed from do_get_faddr, which already called
find_page(uint64_t) successfully for the same page. Claude initially proposed
removing the host_addr overloads as dead code left over from the pre-lazy-TLB bulk
relocation (commit 3ac97eb). But do_write_tlb at line 562 does call
find_page(host_addr) to reverse-translate vh_offset back to a physical address --
the overloads are needed, just their error path is unreachable.

do_read_memory / do_write_memory / do_putchar. These are required by the
i_state_access CRTP interface but never called during step replay.

Page data ordering check (line 248). Pages are stored contiguously in the parsed
log, so their data addresses are always in increasing order by construction. The
check guards against a hypothetical future where page data is independently
allocated. This was already marked with LCOV_EXCL using LCOV_EXCL_END instead
of LCOV_EXCL_STOP -- gcovr requires LCOV_EXCL_STOP.

Changes made

src/json-util.cpp

Removed log2_size >= 64 bounds check from access log deserialization.

Removed read/written data size validation.

Removed sibling_hashes depth validation.

Removed the type == write guard on deserializing the written field.

Fixed error message that said "written" when checking read data size.

src/pmas.h

pmas_get_abs_addr clamps out-of-bounds pma_index to a sentinel entry at PMA_MAX.

Added static_assert that there is room for the sentinel entry.

src/machine-address-ranges.cpp

Added bounds check on PMA count in push_back.

src/machine.cpp

Added static_assert and runtime check for PMA count in init_pmas_contents.

src/record-step-state-access.h

do_init_hot_tlb_slot: touch PMA page unconditionally (before validation), validate
first via init_hot_tlb_slot, then touch target page only if valid.

src/replay-step-state-access.h

Added LCOV_EXCL_START/LCOV_EXCL_STOP with explanatory comments for five
unreachable code paths: page data ordering check, find_page(host_addr) error,
leaf-level sibling check, do_read_memory/do_write_memory, and do_putchar.

Fixed LCOV_EXCL_END -> LCOV_EXCL_STOP (gcovr requires STOP, not END).

do_write_tlb: write zero_padding field to shadow TLB.

src/replay-send-cmio-state-access.h

Added LCOV_EXCL markers for six lines that are genuinely unreachable through the
current API (aligned address checks, empty-log read check, null data check, address
mismatch in check_read).

Fixed error message capitalization.

src/clua-cartesi.cpp, src/machine-c-api.cpp, src/machine-c-api.h

Exposed AR_SHADOW_STATE_START, AR_SHADOW_STATE_LENGTH, AR_PMAS_START,
AR_PMAS_LENGTH constants through Lua and C APIs for use by TLB validation tests.

tests/lua/cartesi/tests/util.lua

Made posix.stdlib optional via pcall. Falls back to relative paths.

tests/lua/machine-bind.lua

Updated error expectations in verify_reset_uarch and verify_step_uarch unhappy
path tests to match the C++ replay errors instead of the removed JSON-layer errors.

tests/lua/spec-verify-uarch-failure.lua (new)

67 tests exercising every reachable validation error path:

verify_step_uarch (30 tests): basic step corruptions (empty log, extra access,
wrong type/address/log2_size, corrupt data/hashes/siblings, ordinal formatting for
1st-4th accesses, wrong final hash) and TLB write corruptions via the
ecall-write-tlb test binary (wrong type/address, corrupt siblings, wrong/missing
written_hash, corrupt read/written data).

verify_send_cmio_response (37 tests): check_read errors (7 tests),
do_write_memory_with_padding errors (8 tests), check_write errors (12 tests),
log structure errors (5 tests including truncation at three different points),
ordinal coverage, zero-length data path, and wrong final hash.

tests/lua/spec-verify-step-failure.lua (new)

24 tests exercising the binary step log verifier:

Log parsing errors (9 tests): truncation at each field boundary (root hash before,
mcycle count, root hash after, hash function type, page count, sibling count, sibling
hashes), extra trailing data, and unsupported hash function type.

Page validation errors (4 tests): out-of-order page indices, non-zero scratch hash,
extra pages beyond the log's page count, and too many pages in the Merkle tree
reconstruction.

Sibling hash errors (3 tests): too few siblings at internal level, too few siblings
at leaf level, and too many siblings.

Root hash / replay errors (5 tests): initial root hash mismatch, wrong mcycle
count, wrong root hash after, missing page during replay (requiring recomputation of
sibling hashes for a reduced page set), and corrupt page data causing an interpreter
exception during replay.

Hash function coverage (3 tests): all tests run with both SHA-256 and Keccak-256,
plus an explicit test for unsupported hash function type.

tests/lua/pre-thrash-tlb.lua, tests/lua/post-thrash-tlb.lua (new)

Setup and verification scripts for the TLB thrash test, which exercises log_step /
verify_step with corrupt shadow TLB entries across all TLB sets.

tests/machine/src/thrash-tlb.S (new)

RISC-V test binary that reads and writes addresses mapping to every TLB slot, forcing
evictions and re-validations.

tests/lua/test-spec.lua

Added require("spec-verify-step-failure") and require("spec-verify-uarch-failure")
so both tests run as part of test-lua.

.github/workflows/build.yml

Added test-machine-with-log-step to the coverage CI job.

tests/Makefile

Fixed build-tests-all to work inside the container without forcing fuzzer seed corpus
generation.

Coverage results

Overall (coverage -> coverage2):

Metric Before After Delta

Lines 18578/23339 (79.6%) 18827/23312 (80.7%) +249, +1.1%

Functions 3827/6361 (60.2%) 5227/6358 (82.2%) +1400, +22.0%

Branches 8985/20588 (43.6%) 9377/20506 (45.7%) +392, +2.1%

Critical verification files:

File Before After

replay-step-state-access.h 68.8% 100.0%

replay-send-cmio-state-access.h 62.4% 97.8%

uarch-replay-state-access.h 84.0% 99.4%

uarch-record-state-access.h 85.4% 94.2%

record-step-state-access.h 53.7% 82.9%

shadow-tlb.h 40.4% 55.3%

shadow-uarch-state.h 31.3% 41.0%

pmas.h 60.3% 64.9%

address-range.h 76.9% 77.8%

machine.cpp 86.9% 87.9%

Remaining uncovered lines

In replay-send-cmio-state-access.h (4 uncovered, all marked LCOV_EXCL):

"address not aligned to word size" -- register addresses are always 8-byte aligned.

"too few accesses in log" in check_read -- the constructor catches empty logs
first, and the only read (iflags.Y) is always first.

Address mismatch in check_read -- only one read address is used (iflags.Y).

In uarch-replay-state-access.h (1 uncovered):

return "unknown_" in access_type_name -- only read and write exist.

In replay-step-state-access.h (0 uncovered after LCOV_EXCL markers):

All remaining uncovered paths are marked with LCOV_EXCL and documented with proofs
of unreachability (see "Marking genuinely unreachable code" above).

Many not-taken branches in the coverage report are GCC's exception-handling machinery
(implicit branches for std::string allocation failure inside throw expressions),
not real logic branches.

edubart · 2026-03-31T20:30:46Z

+    // In contrast, a STATE_ACCESS that does not have access to hot out-of-state slots cannot mark TLB slots
+    // as not-yet-initialized.
+    // We must verify the cold slot at every hit and treat inconsistent entries as misses
+    if (!a.template verify_cold_tlb_slot<TLB_READ>(slot_index)) [[unlikely]] {


According to coverage report downloaded from CI, we have no coverage the case when verify_cold_tlb_slot fails (for state accessor that implements them) and falls inside this if for:

fetch_translate_pc

read_virtual_memory

write_virtual_memory

I would feel more confident if we had tests run by the CI that covered them (outside the fuzzer).
Could be a simple test that thrashes the TLB on purpose.

The only state access that implements verify_cold_tlb_slot (other than simply returning true) is the one the uarch bridge, used to compile interpret to uarch.bin so it can run inside the uarch. So unless we add a way to extract coverage from interpret while it is running inside uarch, the coverage will not show...

Guess I will try that. :)

Uarch coverage collection

Motivation

The emulator's interpret() function is compiled twice: once for the host
(with gcov instrumentation), and once as a RISC-V binary that runs inside the
microarchitecture emulator (uarch-ram.bin). The host coverage report misses
code paths that are only exercised inside the uarch -- most notably
machine-uarch-bridge-state-access.h (which is never compiled into the host)
and the failure branch of verify_cold_tlb_slot() in interpret.cpp (which
requires the bridge state access to trigger).

How it works

1. Separate coverage uarch binary

The production uarch-ram.bin is compiled with -O2 -g0 for performance.
For coverage, a separate uarch-ram-coverage.bin is built alongside it
(when coverage=yes) with -O0 -g -DCODE_COVERAGE. This gives:

Full debug info for accurate addr2line PC-to-source mapping

No inlining (CODE_COVERAGE disables FORCE_INLINE, which otherwise
uses __attribute__((always_inline)) and defeats -fno-inline)

No --gc-sections (which can strip debug sections)

Both binaries are built from the same source using separate object files
(.cov_cpp.o / .cov_c.o suffixes) so they don't interfere.

The production binary is used for all normal tests. The coverage binary is
only loaded for the run_uarch_coverage tests via --uarch-ram-image.

2. PC collection during test runs

The run_uarch_coverage command in cartesi-machine-tests.lua runs tests
through the uarch interpreter one cycle at a time, reading uarch_pc before
each cycle and collecting unique PCs into a Lua table. After each test, the
PCs are written to a .pcs file (one hex address per line) in the directory
specified by --uarch-pc-output-dir.

The test-coverage-uarch-pcs Makefile target runs the csr and thrash-tlb
tests in this mode. This target is separate from test-coverage-uarch
(which runs the validation tests without PC collection) so that non-coverage
CI jobs (e.g. sanitize) don't need the coverage binary.

Tests with pre/post scripts (like the thrash-tlb corruption test) get a hash
suffix in the .pcs filename to avoid collisions with the plain version.

3. Resolving PCs to source lines

The tests/scripts/uarch-pcs-to-gcov.lua script resolves the collected PCs
to source file, function name, and line number using addr2line -f against
uarch/uarch-ram-coverage.elf.

The script handles DWARF path resolution in three cases:

Direct match: DWARF paths match the local gcov_dir prefix (e.g.
/usr/src/emulator/src/interpret.cpp on CI). The prefix is stripped to
get the bare filename.

Project root match: DWARF paths are under the project root but outside
gcov_dir (e.g. /usr/src/emulator/uarch/machine-uarch-bridge-state-access.h).
The project root is computed from gcov_dir and the path is made relative
(e.g. ../uarch/machine-uarch-bridge-state-access.h).

Basename fallback: DWARF paths don't match the local tree at all (e.g.
the ELF was built inside Docker with paths like /opt/cartesi/...). The
script extracts the basename and checks if the file exists under uarch/
or src/ in the local tree.

Paths outside the project (e.g. C++ standard library headers) are filtered
out. If addr2line is not available, the script exits gracefully and the
report is generated without uarch data.

4. Running gcov with proper merging

The tests/scripts/run-gcov.lua script runs gcov (or llvm-cov gcov)
on each .gcda file individually and merges the resulting .gcov files.

This works around a bug in llvm-cov gcov: when processing multiple .gcda
files that share headers, it overwrites the .gcov file for each shared
header rather than accumulating counts. GNU gcov merges correctly but the
script works with both.

The merge adds execution counts from all versions of each source line, and
prefers ##### (uncovered but executable) over - (non-executable) for
lines that appear in only some compilation units.

5. Merging uarch coverage into .gcov files

After run-gcov.lua produces the host .gcov files, uarch-pcs-to-gcov.lua
modifies them before gcovr reads them:

Existing .gcov files (e.g. interpret.cpp.gcov): lines marked as
uncovered (#####) that were hit by the uarch get their count replaced.
Lines already marked as executed by the host get the uarch count added.

New .gcov files (e.g. for machine-uarch-bridge-state-access.h):
created from scratch with function records (required by gcovr to recognize
executable lines) and line hit counts. Non-hit lines are marked as
non-executable (-) since there is no way to determine which lines the
compiler considers executable without gcov instrumentation data.

6. Generating the report

gcovr --use-gcov-files reads all .gcov files from src/ and produces
the HTML report and text summary. The --filter flags include both src/
and uarch/ directories to pick up the bridge header and other uarch-only
source files.

On systems without the RISC-V toolchain, the uarch-pcs-to-gcov.lua script
runs inside the toolchain Docker container (which has
riscv64-unknown-elf-addr2line). The gcov and gcovr steps run on the host.

7. Unified coverage toolchain

On macOS, clang coverage now uses --coverage (gcc-compatible .gcno/.gcda
format) instead of -fprofile-instr-generate -fcoverage-mapping. This means
the same gcov/gcovr pipeline works on both Linux (gcc) and macOS (clang),
and LCOV_EXCL_START/LCOV_EXCL_STOP markers are respected on both
platforms. The COVERAGE_TOOLCHAIN variable is exported from tests/Makefile
so sub-makes inherit the correct value.

Limitations

For source files that exist only in the uarch binary (like the bridge
header), all non-hit lines appear as non-executable in the report. This
means the report shows which lines were executed, but cannot show which
lines should have been executed but were not.

Even with -O0, the coverage binary is a different compilation from the
host. Template instantiations may differ, so some lines in shared headers
might not be attributed identically.

Running locally

From a clean checkout:

make submodules make -j$(nproc) coverage=yes make -C tests build-tests-machine-with-toolchain coverage=yes make -C tests build-tests-misc coverage=yes make -C tests build-tests-uarch-with-toolchain coverage=yes make -C tests build-tests-images coverage=yes eval $(make env) make -C tests -j1 coverage=yes \ test-save-and-load \ test-machine \ test-lua \ test-c-api \ test-coverage-machine \ test-uarch-rv64ui \ test-uarch-interpreter \ test-coverage-uarch \ test-coverage-uarch-pcs \ test-machine-with-log-step make -C tests coverage-report coverage=yes # Report at tests/build/coverage/gcc/index.html

To regenerate just the report (after tests have already run):

make -C tests coverage-report coverage=yes

Files

uarch/Makefile -- builds both uarch-ram.bin (production) and
uarch-ram-coverage.bin (with -O0 -g -DCODE_COVERAGE) when coverage=yes

tests/lua/cartesi-machine-tests.lua -- run_uarch_coverage command and
--uarch-pc-output-dir / --uarch-ram-image options

tests/scripts/run-gcov.lua -- runs gcov per .gcda with proper merging

tests/scripts/uarch-pcs-to-gcov.lua -- resolves PCs and merges into
.gcov files

tests/scripts/generate-coverage-report.sh -- standalone script for
running the full coverage pipeline

tests/Makefile -- test-coverage-uarch (validation tests),
test-coverage-uarch-pcs (PC collection), coverage-report (report
generation)

.github/workflows/build.yml -- CI coverage job runs both targets

Amazing 💯

Instead of using TLB_INVALID_PAGE, the correct invalidation is ~pc. This ensures the xor trick in fetch_insn doesn't fail when pc is in the last page of virtual memory.

edubart

For me it is good enough already, good work!

diegonehab requested a review from edubart March 18, 2026 14:55

github-project-automation Bot added this to Machine Unit Mar 18, 2026

github-project-automation Bot moved this to Todo in Machine Unit Mar 18, 2026

diegonehab force-pushed the feature/fuzz branch 12 times, most recently from b66f293 to 8270534 Compare March 23, 2026 17:33

diegonehab marked this pull request as draft March 24, 2026 16:05

diegonehab force-pushed the feature/fuzz branch from 09ca2d0 to b65e8e7 Compare March 24, 2026 17:39

diegonehab marked this pull request as ready for review March 24, 2026 18:59

diegonehab requested a review from mpernambuco March 24, 2026 19:00

diegonehab force-pushed the feature/fuzz branch from b65e8e7 to e4a5bd9 Compare March 24, 2026 20:39

edubart assigned diegonehab Mar 31, 2026

edubart moved this from Todo to Waiting Review in Machine Unit Mar 31, 2026

edubart added this to the v0.20.0 milestone Mar 31, 2026

edubart requested changes Mar 31, 2026

View reviewed changes

github-project-automation Bot moved this from Waiting Review to In Progress in Machine Unit Mar 31, 2026

diegonehab added 5 commits April 2, 2026 15:10

feat: add fuzzing with llvm libFuzzer

9a7c5c4

fix: ensure PC alignment invariant at startup

cb62195

fix: enforce WARL registers when reading

5fb245d

fix: properly invalidate fetch cache

39b748a

Instead of using TLB_INVALID_PAGE, the correct invalidation is ~pc. This ensures the xor trick in fetch_insn doesn't fail when pc is in the last page of virtual memory.

fix: assert_no_break with multiple interrupts

b073a53

diegonehab force-pushed the feature/fuzz branch 18 times, most recently from be7b44f to 1558faa Compare April 8, 2026 19:39

diegonehab requested a review from edubart April 8, 2026 20:20

diegonehab force-pushed the feature/fuzz branch from 67e120a to eb0f3a8 Compare April 8, 2026 21:28

feat: compute coverage of code run inside uarch

384ec21

diegonehab force-pushed the feature/fuzz branch from eb0f3a8 to 384ec21 Compare April 8, 2026 21:53

edubart approved these changes Apr 8, 2026

View reviewed changes

github-project-automation Bot moved this from In Progress to Waiting Merge in Machine Unit Apr 8, 2026

edubart approved these changes Apr 8, 2026

View reviewed changes

diegonehab merged commit 384ec21 into main Apr 9, 2026
9 checks passed

diegonehab deleted the feature/fuzz branch April 9, 2026 08:17

github-project-automation Bot moved this from Waiting Merge to Done in Machine Unit Apr 9, 2026

edubart mentioned this pull request Apr 9, 2026

Lazy validation of TLB #362

Closed

edubart added the enhancement New feature or request label Apr 9, 2026

File	Before	After adding log_step
replay-send-cmio-state-access.h	62.4%	62.4% (unchanged)
uarch-replay-state-access.h	84.0%	84.0% (unchanged)

Metric	Before	After	Delta
Lines	18578/23339 (79.6%)	18827/23312 (80.7%)	+249, +1.1%
Functions	3827/6361 (60.2%)	5227/6358 (82.2%)	+1400, +22.0%
Branches	8985/20588 (43.6%)	9377/20506 (45.7%)	+392, +2.1%

File	Before	After
replay-step-state-access.h	68.8%	100.0%
replay-send-cmio-state-access.h	62.4%	97.8%
uarch-replay-state-access.h	84.0%	99.4%
uarch-record-state-access.h	85.4%	94.2%
record-step-state-access.h	53.7%	82.9%
shadow-tlb.h	40.4%	55.3%
shadow-uarch-state.h	31.3%	41.0%
pmas.h	60.3%	64.9%
address-range.h	76.9%	77.8%
machine.cpp	86.9%	87.9%

Conversation

diegonehab commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Bugs Found and Fixed

1. Misaligned PC at startup

2. WARL registers not legalized through state access layer

3. Fetch cache incorrectly invalidated

4. Debug assertion too strict with multiple pending interrupts

5. iunrep read from mutable shadow state at runtime

Uh oh!

edubart left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edubart Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

diegonehab Apr 6, 2026

Choose a reason for hiding this comment

Coverage improvement summary

Starting point

Identifying the next gaps

What changed vs the original plan

verify_step failure tests (big-machine replay)

Fixing record/replay TLB validation

Marking genuinely unreachable code

Changes made

src/json-util.cpp

src/pmas.h

src/machine-address-ranges.cpp

src/machine.cpp

src/record-step-state-access.h

src/replay-step-state-access.h

src/replay-send-cmio-state-access.h

src/clua-cartesi.cpp, src/machine-c-api.cpp, src/machine-c-api.h

tests/lua/cartesi/tests/util.lua

tests/lua/machine-bind.lua

tests/lua/spec-verify-uarch-failure.lua (new)

tests/lua/spec-verify-step-failure.lua (new)

tests/lua/pre-thrash-tlb.lua, tests/lua/post-thrash-tlb.lua (new)

tests/machine/src/thrash-tlb.S (new)

tests/lua/test-spec.lua

.github/workflows/build.yml

tests/Makefile

Coverage results

Remaining uncovered lines

Uh oh!

edubart Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

diegonehab Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

diegonehab Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

diegonehab Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uarch coverage collection

Motivation

How it works

1. Separate coverage uarch binary

2. PC collection during test runs

3. Resolving PCs to source lines

4. Running gcov with proper merging

5. Merging uarch coverage into .gcov files

6. Generating the report

7. Unified coverage toolchain

Limitations

Running locally

Files

Uh oh!

diegonehab Apr 8, 2026

Choose a reason for hiding this comment

diegonehab commented Mar 18, 2026 •

edited

Loading

5. `iunrep` read from mutable shadow state at runtime

diegonehab Apr 7, 2026 •

edited

Loading