Fuzzing MLIR compilers with Custom Mutation Synthesis, Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim, 47th International Conference on Software Engineering (ICSE '25) 12 pages https://arxiv.org/abs/2404.16947
A video recording of the slides presented at ICSE 2025 is available here: https://youtu.be/lQ4vZQF2BnQ
This artifact including code and data can be obtained on FigShare: https://doi.org/10.6084/m9.figshare.25458925
The code is also available on GitHub at: https://github.com/UCLA-SEAL/SynthFuzz
A preprint has been made available at: https://arxiv.org/abs/2404.16947
This artifact was tested on a machine with an AMD Ryzen 2950X CPU with 32 GB of RAM.
Before running this artifact, please install Docker (Installation Instructions). Please also ensure that this artifact is extracted to a directory whose absolute path does not contain spaces.
- Build the docker image either by running
./docker/build_from_scratch.shto build the image from scratch, or by running./docker/build_with_prebuilt.shto download a pre-built image from DockerHub. - Start the container by running
./docker/run_default.sh - Enter the container by running
./docker/attach.sh. All commands after this point should be run inside the container.
An example script and data has been provided under the example directory to demonstrate how to use SynthFuzz.
SynthFuzz is implemented as an extension to Grammarinator and can be used in the same fashion.
To run this example, simply run the example script:
cd example
./example.shThis script will execute the follwoing commands:
Firstly, the grammar is processed to produce a test generator. For SynthFuzz, this step also generates a insert_patterns.pkl file which is used by the fuzzer to determine which production rules contain quantifiers.
python -m mlirmut.synthfuzz.process mlir_2023.g4 --rule start_rule -o mlirgen- The
--rule <rule_name>option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is namedstart_rule. - The
-o <output_dir>option sets the directory where where to output theinsert_patterns.pklfile.
Then, using Grammarinator, the inputs are parsed using the grammar to generate parse-trees which are used as seed inputs for the fuzzer:
grammarinator-parse \
-r start_rule \
-i inputs/*.mlir \
-o trees \
mlir_2023.g4- The
-r <rule_name>option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is namedstart_rule. - The
-i <inputs>option takes the paths to the seed inputs to be parsed. - The
-o <output_dir>option sets the directory to output the parse-trees of the given seed inputs.
Finally, the fuzzer can be used to generate inputs:
python -m mlirmut.synthfuzz.generate \
mlir_2023Generator.mlir_2023Generator \
-r start_rule \
-d 100 \
-o outputs/%d.mlir \
-n 10 \
--sys-path mlirgen \
--population trees \
--insert-patterns mlirgen/insert_patterns.pkl \
--mutation-config mutation_config.toml \
--keep-trees \
--no-generate --no-recombine --no-mutate \
--k-ancestors=4 --l-siblings=4 --r-siblings=4- The
-r <rule_name>option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is namedstart_rule. - The
-d 100option sets the maximum depth of a mutated parse-tree to be 100. This limits t - The
-o outputs/%d.mliroption sets the path and filename pattern for the generated fuzz inputs. - The
-n 10option sets the target number of generated fuzz inputs to 10. - The
--sys-path mlirgenoption takes the path to the output directory specified earlier for thepython -m mlirmut.synthfuzz.processcommand. This is only needed if you use the--generateoption to use Grammarinator's grammar-based generation strategy. - The
--population treesoption takes the path to the parse-trees of the seed inputs. This should be the output directory of the previousgrammarinator-parsecommand. The--insert-patterns mlirgen/insert_patterns.pkloption takes the path to theinsert_patterns.pklfile generated by thepython -m mlirmut.synthfuzz.processcommand. This is used by SynthFuzz to determine possible insertion locations in the parse-tree. - The
--mutation-config mutation_config.tomloption takes a configuration file that specifies optional heuristics to improve mutation for a specific grammar/domain. More information about the configuration options available can be found in the example file:example/mutation_config.toml. The--keep-treesoption adds any mutated seed inputs to the seed corpus after generation. This enables SynthFuzz to apply multiple mutations to the same input over time. The--no-generate --no-recombine --no-mutateoptions disable the vanilla Grammarinator generation strategies to demo SynthFuzz's mutation strategy in this example. THe--k-ancestors=4 --l-siblings=4 --r-siblings=4options configure the maximum amount of context to match when selecting a location to insert/mutate in the parse tree of a seed test case. A reasonable default is to all values to 4 as shown in this example.
For more information regarding SynthFuzz-specific command line options, run: python -m mlirmut.synthfuzz.generate --help
The post-processed branch and dialect pair coverage has been included with this artifact for convenience.
If you would like to reproduce the results from scratch delete the data directory and follow the directions in the Running Experiments and Collecting Coverage From Scratch section before continuing with this section. Note that running the experiments from scratch may take several days depending on your machine.
All commands should be run inside the Docker container.
cd /synthfuzz
python figures-tables/coverage.pyAll commands should be run inside the Docker container.
cd /synthfuzz
python figures-tables/diversity.pyAll commands should be run inside the Docker container.
cd /synthfuzz
python figures-tables/ablation-context.pyAll commands should be run inside the Docker container.
cd /synthfuzz
python figures-tables/ablation-params.pyAll commands should be run inside the Docker container.
This section is only required if you would like to re-generate the data directory from scratch. Total estimated time required: ~70 hours.
- Compile each subject program (~10 hours) with coverage enabled:
cd /synthfuzz/eval
# build mlir-opt
./mlir/build_mlir.sh
# build onnx-mlir-opt
./onnx/build_onnx_mlir.sh
# build triton-opt
./triton/build.sh
# build circt-opt
./circt/build_circt.sh- Extract seed test cases from each subject's repositories (~10 minutes):
cd /synthfuzz/eval
./mlir/find_seeds.sh
./onnx/find_seeds.sh
./triton/find_seeds.sh
./circt/find_seeds.sh- Optional only if you want to evaluate against NeuRI: For this step only, NeuRI needs to be run in its own container. Run the following outside the synthfuzz-artifact-icse2025 container:
cd synthfuzz-icse2025/eval/neuri
./start_docker.sh
./gen_indocker.sh # inside the neuri-artifact containerNow returning to the synthfuzz-artifact-icse2025 container:
cd /synthfuzz/eval/neuri
python copy_models.py
python tf_to_onnx.py
python onnx_to_mlir.py
python onnx_to_onnx_mlir.py- Run each experiment. Each fuzzing run is allocated 4 hours. Additionally, indexing and merging the coverage profiles may take significant memory and time (1-2 hours per experiment). In total, this may require ~50-60 hours depending on the CPU and memory available.
# install computepairs
cd /synthfuzz/computepairs
go install
# ablation
cd /synthfuzz/eval/mlir/ablation/context && ./run.sh
cd /synthfuzz/eval/mlir/ablation && ./no_parameters.sh
cd /synthfuzz/eval/mlir/ablation && ./with_parameters.sh
# Coverage experiments
cd /synthfuzz/eval/mlirsmith && ./run.sh
cd /synthfuzz/eval/mlir/baseline && ./run.sh
cd /synthfuzz/eval/mlir/synthfuzz && ./run.sh
cd /synthfuzz/eval/mlir/grammarinator && ./run.sh
cd /synthfuzz/eval/mlir/mlirsmith && ./run.sh
cd /synthfuzz/eval/onnx/baseline && ./run.sh
cd /synthfuzz/eval/onnx/synthfuzz && ./run.sh
cd /synthfuzz/eval/onnx/grammarinator && ./run.sh
cd /synthfuzz/eval/onnx/mlirsmith && ./run.sh
cd /synthfuzz/eval/triton/baseline && ./run.sh
cd /synthfuzz/eval/triton/synthfuzz && ./run.sh
cd /synthfuzz/eval/triton/grammarinator && ./run.sh
cd /synthfuzz/eval/triton/mlirsmith && ./run.sh
cd /synthfuzz/eval/circt/baseline && ./run.sh
cd /synthfuzz/eval/circt/synthfuzz && ./run.sh
cd /synthfuzz/eval/circt/grammarinator && ./run.sh
cd /synthfuzz/eval/circt/mlirsmith && ./run.sh
# Only if step 3 was followed:
cd /synthfuzz/eval/mlir/neuri && ./run.sh
cd /synthfuzz/eval/onnx/neuri && ./run.sh
cd /synthfuzz/eval/triton/neuri && ./run.sh
cd /synthfuzz/eval/circt/neuri && ./run.sh