Skip to content

ICSE 2025: Fuzzing MLIR compilers with Custom Mutation Synthesis

License

Notifications You must be signed in to change notification settings

UCLA-SEAL/SynthFuzz

Repository files navigation

Paper

Fuzzing MLIR compilers with Custom Mutation Synthesis, Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim, 47th International Conference on Software Engineering (ICSE '25) 12 pages https://arxiv.org/abs/2404.16947

A video recording of the slides presented at ICSE 2025 is available here: https://youtu.be/lQ4vZQF2BnQ

Provenance

This artifact including code and data can be obtained on FigShare: https://doi.org/10.6084/m9.figshare.25458925

The code is also available on GitHub at: https://github.com/UCLA-SEAL/SynthFuzz

A preprint has been made available at: https://arxiv.org/abs/2404.16947

Setup

Hardware

This artifact was tested on a machine with an AMD Ryzen 2950X CPU with 32 GB of RAM.

Software

Before running this artifact, please install Docker (Installation Instructions). Please also ensure that this artifact is extracted to a directory whose absolute path does not contain spaces.

  1. Build the docker image either by running ./docker/build_from_scratch.sh to build the image from scratch, or by running ./docker/build_with_prebuilt.sh to download a pre-built image from DockerHub.
  2. Start the container by running ./docker/run_default.sh
  3. Enter the container by running ./docker/attach.sh. All commands after this point should be run inside the container.

Usage: Example

An example script and data has been provided under the example directory to demonstrate how to use SynthFuzz.

SynthFuzz is implemented as an extension to Grammarinator and can be used in the same fashion.

To run this example, simply run the example script:

cd example
./example.sh

This script will execute the follwoing commands:

Firstly, the grammar is processed to produce a test generator. For SynthFuzz, this step also generates a insert_patterns.pkl file which is used by the fuzzer to determine which production rules contain quantifiers.

python -m mlirmut.synthfuzz.process mlir_2023.g4 --rule start_rule -o mlirgen
  • The --rule <rule_name> option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is named start_rule.
  • The -o <output_dir> option sets the directory where where to output the insert_patterns.pkl file.

Then, using Grammarinator, the inputs are parsed using the grammar to generate parse-trees which are used as seed inputs for the fuzzer:

grammarinator-parse \
    -r start_rule \
    -i inputs/*.mlir \
    -o trees \
    mlir_2023.g4
  • The -r <rule_name> option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is named start_rule.
  • The -i <inputs> option takes the paths to the seed inputs to be parsed.
  • The -o <output_dir> option sets the directory to output the parse-trees of the given seed inputs.

Finally, the fuzzer can be used to generate inputs:

python -m mlirmut.synthfuzz.generate \
    mlir_2023Generator.mlir_2023Generator \
    -r start_rule \
    -d 100 \
    -o outputs/%d.mlir \
    -n 10 \
    --sys-path mlirgen \
    --population trees \
    --insert-patterns mlirgen/insert_patterns.pkl \
    --mutation-config mutation_config.toml \
    --keep-trees \
    --no-generate --no-recombine --no-mutate \
    --k-ancestors=4 --l-siblings=4 --r-siblings=4
  • The -r <rule_name> option sets the starting/root production rule of the grammar. For mlir_2023.g4, the starting rule is named start_rule.
  • The -d 100 option sets the maximum depth of a mutated parse-tree to be 100. This limits t
  • The -o outputs/%d.mlir option sets the path and filename pattern for the generated fuzz inputs.
  • The -n 10 option sets the target number of generated fuzz inputs to 10.
  • The --sys-path mlirgen option takes the path to the output directory specified earlier for the python -m mlirmut.synthfuzz.process command. This is only needed if you use the --generate option to use Grammarinator's grammar-based generation strategy.
  • The --population trees option takes the path to the parse-trees of the seed inputs. This should be the output directory of the previous grammarinator-parse command. The --insert-patterns mlirgen/insert_patterns.pkl option takes the path to the insert_patterns.pkl file generated by the python -m mlirmut.synthfuzz.process command. This is used by SynthFuzz to determine possible insertion locations in the parse-tree.
  • The --mutation-config mutation_config.toml option takes a configuration file that specifies optional heuristics to improve mutation for a specific grammar/domain. More information about the configuration options available can be found in the example file: example/mutation_config.toml. The --keep-trees option adds any mutated seed inputs to the seed corpus after generation. This enables SynthFuzz to apply multiple mutations to the same input over time. The --no-generate --no-recombine --no-mutate options disable the vanilla Grammarinator generation strategies to demo SynthFuzz's mutation strategy in this example. THe --k-ancestors=4 --l-siblings=4 --r-siblings=4 options configure the maximum amount of context to match when selecting a location to insert/mutate in the parse tree of a seed test case. A reasonable default is to all values to 4 as shown in this example.

For more information regarding SynthFuzz-specific command line options, run: python -m mlirmut.synthfuzz.generate --help

Usage: Generating Figures and Tables

The post-processed branch and dialect pair coverage has been included with this artifact for convenience. If you would like to reproduce the results from scratch delete the data directory and follow the directions in the Running Experiments and Collecting Coverage From Scratch section before continuing with this section. Note that running the experiments from scratch may take several days depending on your machine.

RQ1: Branch Coverage

All commands should be run inside the Docker container.

cd /synthfuzz
python figures-tables/coverage.py

RQ2: Dialect Pair Coverage

All commands should be run inside the Docker container.

cd /synthfuzz
python figures-tables/diversity.py

RQ3: Context-based Location Selection

All commands should be run inside the Docker container.

cd /synthfuzz
python figures-tables/ablation-context.py

RQ4: Parameterization

All commands should be run inside the Docker container.

cd /synthfuzz
python figures-tables/ablation-params.py

Running Experiments from Scratch and Collecting Coverage:

All commands should be run inside the Docker container.

This section is only required if you would like to re-generate the data directory from scratch. Total estimated time required: ~70 hours.

  1. Compile each subject program (~10 hours) with coverage enabled:
cd /synthfuzz/eval
# build mlir-opt
./mlir/build_mlir.sh
# build onnx-mlir-opt
./onnx/build_onnx_mlir.sh
# build triton-opt
./triton/build.sh
# build circt-opt
./circt/build_circt.sh
  1. Extract seed test cases from each subject's repositories (~10 minutes):
cd /synthfuzz/eval
./mlir/find_seeds.sh
./onnx/find_seeds.sh
./triton/find_seeds.sh
./circt/find_seeds.sh
  1. Optional only if you want to evaluate against NeuRI: For this step only, NeuRI needs to be run in its own container. Run the following outside the synthfuzz-artifact-icse2025 container:
cd synthfuzz-icse2025/eval/neuri
./start_docker.sh
./gen_indocker.sh  # inside the neuri-artifact container

Now returning to the synthfuzz-artifact-icse2025 container:

cd /synthfuzz/eval/neuri
python copy_models.py
python tf_to_onnx.py
python onnx_to_mlir.py
python onnx_to_onnx_mlir.py
  1. Run each experiment. Each fuzzing run is allocated 4 hours. Additionally, indexing and merging the coverage profiles may take significant memory and time (1-2 hours per experiment). In total, this may require ~50-60 hours depending on the CPU and memory available.
# install computepairs
cd /synthfuzz/computepairs
go install

# ablation
cd /synthfuzz/eval/mlir/ablation/context && ./run.sh
cd /synthfuzz/eval/mlir/ablation && ./no_parameters.sh
cd /synthfuzz/eval/mlir/ablation && ./with_parameters.sh

# Coverage experiments

cd /synthfuzz/eval/mlirsmith && ./run.sh
cd /synthfuzz/eval/mlir/baseline && ./run.sh
cd /synthfuzz/eval/mlir/synthfuzz && ./run.sh
cd /synthfuzz/eval/mlir/grammarinator && ./run.sh
cd /synthfuzz/eval/mlir/mlirsmith && ./run.sh

cd /synthfuzz/eval/onnx/baseline && ./run.sh
cd /synthfuzz/eval/onnx/synthfuzz && ./run.sh
cd /synthfuzz/eval/onnx/grammarinator && ./run.sh
cd /synthfuzz/eval/onnx/mlirsmith && ./run.sh

cd /synthfuzz/eval/triton/baseline && ./run.sh
cd /synthfuzz/eval/triton/synthfuzz && ./run.sh
cd /synthfuzz/eval/triton/grammarinator && ./run.sh
cd /synthfuzz/eval/triton/mlirsmith && ./run.sh

cd /synthfuzz/eval/circt/baseline && ./run.sh
cd /synthfuzz/eval/circt/synthfuzz && ./run.sh
cd /synthfuzz/eval/circt/grammarinator && ./run.sh
cd /synthfuzz/eval/circt/mlirsmith && ./run.sh

# Only if step 3 was followed:
cd /synthfuzz/eval/mlir/neuri && ./run.sh
cd /synthfuzz/eval/onnx/neuri && ./run.sh
cd /synthfuzz/eval/triton/neuri && ./run.sh
cd /synthfuzz/eval/circt/neuri && ./run.sh

About

ICSE 2025: Fuzzing MLIR compilers with Custom Mutation Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •