This repository contains a modular implementation of MIME for partial-view, zero-prior, hard-label GNN model extraction. It keeps only the standard MIME pipeline:
- label-free attributed-graph bootstrapping with DGI,
- entropy-diversity query selection under a node-level hard-label budget,
- topology-regularized surrogate refinement with a Laplacian prediction penalty.
The code is refactored from the original standalone runner into reusable Python modules. It removes baseline methods and extra experimental sweeps, so each run only executes MIME on one dataset, one query budget, and one random seed.
mime_github/
├── README.md
├── requirements.txt
├── run_mime.py
└── mime_steal/
├── __init__.py
├── config.py # MIMEConfig dataclass
├── data.py # dataset loading, splitting, visible subgraph construction
├── mime.py # public run_mime(...) API
├── models.py # GCN and DGI models
├── query.py # k-center, entropy, exploration split
├── train.py # victim training, surrogate refinement, evaluation
└── utils.py # seeds, metrics, serialization utilities
Create an environment with PyTorch and PyTorch Geometric. The exact PyG install command depends on your CUDA/PyTorch version; see the official PyG instructions if needed.
pip install -r requirements.txtFor OGB datasets such as Arxiv and Products, ogb is required.
The command-line loader supports:
CoCSCoraFullArxivProductsAmazon-ratingsSquirrel
For Products, the code uses the front-200K induced subgraph by OGB node order as the working graph.
Run MIME on CoCS with a query budget of 20C, where C is the number of classes:
python run_mime.py --dataset CoCS --c 20 --seed 42Run on Arxiv using CUDA and save results to a custom file:
python run_mime.py \
--dataset Arxiv \
--c 20 \
--seed 42 \
--device cuda \
--output results/arxiv_c20_seed42.csvRun Products with the front-200K working graph:
python run_mime.py --dataset Products --c 20 --seed 42 --device cudafrom mime_steal import MIMEConfig, run_mime
config = MIMEConfig(
root="./data",
prior_ratio=0.10,
device="cuda",
eval_scope="visible", # or "full"
)
result = run_mime(dataset="CoCS", c=20, seed=42, config=config)
print(result)The result dictionary includes accuracy, fidelity, number of queried nodes, runtime breakdown, and memory usage.
The default setting is:
--eval-scope visibleThis evaluates the surrogate on unqueried nodes inside the attacker-visible partial attributed graph, so the surrogate is never given the provider-side full graph during evaluation.
For compatibility with older transductive experimental runners, you can use:
--eval-scope fullThis evaluates the learned surrogate weights on the full graph test mask. The attack still queries only nodes from the attacker-visible partial graph.
python run_mime.py \
--dataset CoCS \
--c 20 \
--seed 42 \
--root ./data \
--device cuda \
--prior-ratio 0.10 \
--eval-scope visibleOptional training knobs:
--victim-epochs 200
--dgi-epochs 200
--epochs-per-round 100
--final-epochs 200
--hidden-dim 128
--lap-lambda 5e-4- Victim split: 60% victim-training nodes and 40% held-out test nodes.
- Visible prior: sampled from the victim-training portion according to
--prior-ratio. - Oracle: each query returns one top-1 victim prediction for one real node in the visible partial graph.
- Budget: total queries =
c * number_of_classes. - Surrogate: 2-layer GCN.
- Initial query batch: greedy k-center coverage on DGI representations.
- Later query batches: exploration coverage plus entropy-diversity selection.
- Surrogate loss: hard-label cross-entropy plus prediction-level Laplacian regularization.
Each run appends one row to the output CSV and prints a JSON object like:
{
"dataset": "CoCS",
"seed": 42,
"c": 20,
"budget": 300,
"visible_nodes": 1833,
"queried_nodes": 300,
"eval_scope": "visible",
"accuracy": 0.91,
"fidelity": 0.92,
"pretrain_time": 12.34,
"query_time": 0.56,
"train_time": 78.90,
"total_time": 100.12
}This repository intentionally keeps only the standard MIME attack. It does not include Random, CEGA, GNNSteal, EffGNN, OnStealing, label-flip defenses, domain-shift sweeps, or multi-seed table runners.