This repository contains the official implementation and dataset for the paper:
gMBA: Expression Semantic Guided Mixed Boolean-Arithmetic Deobfuscation Using Transformer Architectures
[Accepted at Findings of ACL2025]
Mixed Boolean-Arithmetic (MBA) expressions are widely used in software obfuscation to hinder reverse engineering and malware analysis.
gMBA presents a novel Transformer-based sequence-to-sequence model that deciphers obfuscated MBA expressions into their simplified forms.
Unlike prior approaches, gMBA leverages:
- Semantic features from the expression via automatically constructed truth tables
- A lightweight yet expressive Transformer architecture guided by these semantics
git clone https://github.com/your-username/gMBA.git
cd gMBA
pip install -r requirements.txt
You may need to install torch series by the commands below
pip install torch==2.0.0+cu117 -f https://download.pytorch.org/whl/torch_stable.html
pip install torchtext==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torchdata==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
Then just run your notebook files!
.
βββ data/
β βββ train_s.csv
β βββ train_s-bool_tt_added.csv
β βββ train_s-arith_tt_added.csv
β βββ train_s-both_tt_added.csv
β βββ test.csv
β βββ test-bool_tt_added.csv
β βββ test-arith_tt_added.csv
β βββ test-both_tt_added.csv
β
βββ baseline/
β βββ transformer_baseline.py
β βββ ...
β
βββ addition/
β βββ transformer_add.ipynb
β βββ ...
βββ concat_token/
β βββ transformer_cat_tok.ipynb
β βββ ...
βββ concat_hiddim/
β βββ transformer_cat_hiddim.ipynb
β βββ ...
β
βββ README.md
βββ requirements.txt