Paper

Fuzzing MLIR compilers with Custom Mutation Synthesis, Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim, 47th International Conference on Software Engineering (ICSE '25) 12 pages https://arxiv.org/abs/2404.16947

A video recording of the slides presented at ICSE 2025 is available here: https://youtu.be/lQ4vZQF2BnQ

Provenance

This artifact including code and data can be obtained on FigShare: https://doi.org/10.6084/m9.figshare.25458925

The code is also available on GitHub at: https://github.com/UCLA-SEAL/SynthFuzz

A preprint has been made available at: https://arxiv.org/abs/2404.16947

Setup

Hardware

This artifact was tested on a machine with an AMD Ryzen 2950X CPU with 32 GB of RAM.

Software

Before running this artifact, please install Docker (Installation Instructions). Please also ensure that this artifact is extracted to a directory whose absolute path does not contain spaces.

Build the docker image either by running ./docker/build_from_scratch.sh to build the image from scratch, or by running ./docker/build_with_prebuilt.sh to download a pre-built image from DockerHub.
Start the container by running ./docker/run_default.sh
Enter the container by running ./docker/attach.sh. All commands after this point should be run inside the container.

Usage: Example

An example script and data has been provided under the example directory to demonstrate how to use SynthFuzz.

SynthFuzz is implemented as an extension to Grammarinator and can be used in the same fashion.

To run this example, simply run the example script:

cd example
./example.sh

This script will execute the follwoing commands:

Firstly, the grammar is processed to produce a test generator. For SynthFuzz, this step also generates a insert_patterns.pkl file which is used by the fuzzer to determine which production rules contain quantifiers.

python -m mlirmut.synthfuzz.process mlir_2023.g4 --rule start_rule -o mlirgen

The --rule <rule_name> option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is named start_rule.
The -o <output_dir> option sets the directory where where to output the insert_patterns.pkl file.

Then, using Grammarinator, the inputs are parsed using the grammar to generate parse-trees which are used as seed inputs for the fuzzer:

grammarinator-parse \
    -r start_rule \
    -i inputs/*.mlir \
    -o trees \
    mlir_2023.g4

The -r <rule_name> option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is named start_rule.
The -i <inputs> option takes the paths to the seed inputs to be parsed.
The -o <output_dir> option sets the directory to output the parse-trees of the given seed inputs.

Finally, the fuzzer can be used to generate inputs:

python -m mlirmut.synthfuzz.generate \
    mlir_2023Generator.mlir_2023Generator \
    -r start_rule \
    -d 100 \
    -o outputs/%d.mlir \
    -n 10 \
    --sys-path mlirgen \
    --population trees \
    --insert-patterns mlirgen/insert_patterns.pkl \
    --mutation-config mutation_config.toml \
    --keep-trees \
    --no-generate --no-recombine --no-mutate \
    --k-ancestors=4 --l-siblings=4 --r-siblings=4

The -r <rule_name> option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is named start_rule.
The -d 100 option sets the maximum depth of a mutated parse-tree to be 100. This limits t
The -o outputs/%d.mlir option sets the path and filename pattern for the generated fuzz inputs.
The -n 10 option sets the target number of generated fuzz inputs to 10.
The --sys-path mlirgen option takes the path to the output directory specified earlier for the python -m mlirmut.synthfuzz.process command. This is only needed if you use the --generate option to use Grammarinator's grammar-based generation strategy.
The --population trees option takes the path to the parse-trees of the seed inputs. This should be the output directory of the previous grammarinator-parse command. The --insert-patterns mlirgen/insert_patterns.pkl option takes the path to the insert_patterns.pkl file generated by the python -m mlirmut.synthfuzz.process command. This is used by SynthFuzz to determine possible insertion locations in the parse-tree.
The --mutation-config mutation_config.toml option takes a configuration file that specifies optional heuristics to improve mutation for a specific grammar/domain. More information about the configuration options available can be found in the example file: example/mutation_config.toml. The --keep-trees option adds any mutated seed inputs to the seed corpus after generation. This enables SynthFuzz to apply multiple mutations to the same input over time. The --no-generate --no-recombine --no-mutate options disable the vanilla Grammarinator generation strategies to demo SynthFuzz's mutation strategy in this example. THe --k-ancestors=4 --l-siblings=4 --r-siblings=4 options configure the maximum amount of context to match when selecting a location to insert/mutate in the parse tree of a seed test case. A reasonable default is to all values to 4 as shown in this example.

For more information regarding SynthFuzz-specific command line options, run: python -m mlirmut.synthfuzz.generate --help

Usage: Generating Figures and Tables

The post-processed branch and dialect pair coverage has been included with this artifact for convenience. If you would like to reproduce the results from scratch delete the data directory and follow the directions in the Running Experiments and Collecting Coverage From Scratch section before continuing with this section. Note that running the experiments from scratch may take several days depending on your machine.