Define the what of this extension: minimal scope to deliver a leakage-safe temporal GNN on Elliptic++ using Time-Relaxed Directed (TRD) sampling. This project reuses baseline artifacts (splits.json + baseline metrics) from the completed repo and does not re-train prior GNN/tabular baselines. All tasks align with this spec.
Project: elliptic-trd-gnn
Goal: Implement a clean, reproducible temporal graph baseline by enforcing no-future neighbors and time-aware neighborhood sampling (TRD) on the same Elliptic++ splits as baseline; quantify whether temporal message passing adds value.
Audience: Recruiters, collaborators, future-you.
Deliverable type: Portfolio/demo repo — readable notebooks first, reusable src/ utilities second.
In scope
- Import
splits.jsonand baselinemetrics_summary.csv(source of truth). - Implement TRD sampler with strict rule: for a target node at time
t*, no neighbor with timestamp >t*. - Train TRD-GraphSAGE (primary) and TRD-GCN (optional).
- (Optional) HHGTN-lite: relation buckets (e.g., in/out/self) without full-blown hetero DSL.
- (Optional) Hypergraph incidence pass for tx–entity–tx motifs (minimal, demonstrative).
- Evaluate with same metrics and temporal splits; produce side-by-side table including imported baseline metrics vs TRD results.
Out of scope (for this repo)
- Re-training previous baselines (GCN/GraphSAGE/GAT, XGB/RF/MLP) — already done.
- Heavy foundation hetero models (e.g., Griffin), curvature/geometry pipelines, production serving.
- Any synthetic/mock data.
Identity: Elliptic++ Bitcoin transaction graph (nodes=transactions; edges=directed flows).
Location: data/Elliptic++ Dataset/ (local only; user provides files).
Download: https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l
Required files
txs_features.csv—txid,timestamp, Local_feature_1..93, Aggregate_feature_1..89 (total 182)txs_classes.csv—txid,class(1=fraud, 2=legit, 3=unknown)txs_edgelist.csv—txId1,txId2
Data policy
- No synthetic data. Stop if files/columns mismatch; request correct path.
- Notebooks must verify file presence before running.
- Any preprocessing is deterministic and logged.
Source of truth: Reuse baseline splits.json.
- Train/Val/Test membership identical to baseline.
- For TRD sampling, enforce split isolation for supervision and no-future neighbors: for node
uwith timet*, sampled neighborsvmust satisfytime(v) ≤ t*. - Keep counts/boundaries identical; persist
splits.jsoncopy underdata/Elliptic++ Dataset/.
- Map
txid→ contiguous indices[0..N-1](persist mapping; must match splits). - Filter edges to known nodes; coalesce duplicates.
- Optional normalization for GNN inputs (fit on train only; apply to val/test).
- Feature policy: start with Local (AF1–AF93) to avoid double-encoding aggregate neighbor stats; optionally compare with All (AF1–AF182) and document any redundancy.
- Enforce directed time-aware neighborhood: no neighbor with
timestamp > target. - Configurable caps:
max_in_neighbors,max_out_neighbors,fanout_per_layer. - Reject/flag any violation; unit test must pass.
- TRD-GraphSAGE (primary) — mean/concat aggregators with TRD neighborhoods.
- TRD-GCN (optional) — GCN layers over TRD-induced subgraphs.
Common: in_channels, hidden_channels, out_channels=2, num_layers, dropout, ReLU.
Output: logits [N,2] for labeled nodes in each split.
- HHGTN-lite: bucket edges by relation type (e.g., IN, OUT, SELF) and apply per-relation weights; single additional relation-mix layer.
- Hypergraph (toy): construct incidence matrix for simple tx–entity hyperedges (e.g., address clusters) and one hyperedge aggregation pass; purely demonstrative.
Loss
CrossEntropyLosson logits[N,2](orBCEWithLogitsconsistently).- Use
pos_weight/ class weights computed from train labels.
Optimization
- Adam,
lr(default 1e-3),weight_decay(default 5e-4). - Early stopping on val PR-AUC (patience ≈ 15 epochs).
Evaluation protocol
-
Threshold selected on val to maximize F1; reuse on test.
-
Report on test:
- PR-AUC (primary), ROC-AUC, F1 at val-selected threshold, Recall@K (K ∈ {0.5%, 1%, 2%}).
-
Side-by-side comparison:
- Import prior baseline metrics (tabular & static GNNs) from baseline
reports/metrics_summary.csv. - Append new TRD rows computed here.
- Produce consolidated table + PR/ROC plots.
- Import prior baseline metrics (tabular & static GNNs) from baseline
Artifacts
-
checkpoints/trd_graphsage_best.pt(and/ortrd_gcn_best.pt) -
reports/metrics.json -
reports/plots/*.png(PR/ROC; TRD diagnostics) -
Append to
reports/metrics_summary.csv:timestamp, experiment, model, split, pr_auc, roc_auc, f1, recall@1%
Always:
from src.utils.seed import set_all_seeds
set_all_seeds(seed)
import torch
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.benchmark = False-
Save
splits.json, scaler params, library versions. -
Avoid absolute paths; use project-relative paths.
-
Provenance lock:
docs/baseline_provenance.jsonwith:- baseline repo URL, commit SHA, Zenodo DOI, date imported.
elliptic-trd-gnn/
│
├── data/
│ └── Elliptic++ Dataset/ # user-provided dataset + baseline splits.json
│
├── notebooks/
│ ├── 01_trd_sampler_mvp.ipynb # TRD checks + subset PR/ROC
│ └── 02_trd_graphsage.ipynb # Full TRD-GraphSAGE run
│
├── src/
│ ├── data/
│ │ ├── elliptic_loader.py # masks + splits (reused)
│ │ └── trd_sampler.py # time-relaxed, no-future neighbors
│ ├── models/
│ │ ├── trd_graphsage.py
│ │ └── trd_gcn.py
│ ├── utils/
│ │ ├── metrics.py
│ │ ├── seed.py
│ │ └── logger.py
│ ├── train.py # CLI wrappers (config-driven)
│ └── eval.py
│
├── configs/
│ ├── trd_graphsage.yaml
│ └── trd_gcn.yaml
│
├── tests/
│ └── test_trd_sampler.py # assert no future neighbors
│
├── reports/
│ ├── plots/
│ └── metrics_summary.csv
│
├── docs/
│ ├── PROJECT_SPEC_v3.md
│ ├── AGENT_v3.MD
│ ├── START_PROMPT_v3.md
│ ├── CLONE_INIT_PROMPT_v3.md
│ └── baseline_provenance.json
│
├── README.md
├── requirements.txt
└── .gitignore
experiment: "elliptic-trd-gnn"
seed: 42
device: "cuda" # or "cpu"
baseline:
provenance: "docs/baseline_provenance.json"
metrics_csv: "../FRAUD-DETECTION-GNN/reports/metrics_summary.csv" # or local copy
reuse_splits: true
data:
root: "data/Elliptic++ Dataset"
features: "txs_features.csv"
labels: "txs_classes.csv"
edges: "txs_edgelist.csv"
splits_file: "splits.json"
trd:
forbid_future_neighbors: true
max_in_neighbors: 15
max_out_neighbors: 15
fanout_per_layer: [15, 10] # l1, l2
direction_weight: {in: 1.0, out: 1.0}
allow_backward_if_past: true
model:
name: "trd_graphsage" # trd_graphsage | trd_gcn
hidden_channels: 128
num_layers: 2
dropout: 0.4
train:
epochs: 80
batch_size: null # full-batch if feasible; else micro-batch
lr: 0.001
weight_decay: 0.0005
early_stopping_patience: 15
eval:
recall_k_fracs: [0.005, 0.01, 0.02]
save_plots: true
logging:
out_dir: "reports"reports/metrics.json (per split)
{
"pr_auc": 0.5721,
"roc_auc": 0.8663,
"best_f1": 0.4987,
"threshold": 0.421,
"recall@1%": 0.392
}Append row to reports/metrics_summary.csv:
timestamp,experiment,model,split,pr_auc,roc_auc,f1,recall@1%
1731143000,elliptic-trd-gnn,TRD-GraphSAGE,test,0.572100,0.866300,0.498700,0.392000
Consolidated table
- A small utility merges imported baseline rows and the new TRD rows for the README.
M1 — Bootstrap & Import
- Repo scaffold matches Section 8;
pip install -r requirements.txtsucceeds. baseline_provenance.jsoncreated; baselinemetrics_summary.csvimported/copied.
M2 — TRD Sampler MVP
trd_sampler.pyimplemented; unit test passes (no future neighbors).01_trd_sampler_mvp.ipynbruns a subset; saves mini PR/ROC; appends CSV.
M3 — TRD-GraphSAGE Full
02_trd_graphsage.ipynbcompletes; saves metrics, plots, checkpoint; CSV appended.
M4 — Comparison Report
- Consolidated table (baseline vs TRD) produced; plots saved to
reports/plots/; README updated.
M5 — (Optional) Variants
- HHGTN-lite or Hypergraph toy pass run once; notes added to README.
- Temporal leakage: TRD sampler enforces
time(neighbor) ≤ time(target); unit test guards this. - Split contamination: Supervision restricted by split masks; edges for sampling may span times only if they obey the no-future rule; document policy per notebook.
- Double-encoding with AF aggregates: Prefer Local AF1–AF93 as default; document any changes.
- ID misalignment: Persist and validate
txid↔index mapping before training. - Non-reproducibility: Seeds + deterministic ops; reuse exact splits; log versions.
- Add calibrated thresholds under shift (expected prevalence drift).
- Rich heterogeneous modeling (full HHGTN), real entity graphs.
- End-to-end temporal contrastive pretraining (future repo).
- Serve lightweight scoring for batch inference (separate project).
- Respect Elliptic++ dataset licensing/terms.
- Cite the dataset and the baseline repo’s DOI in README.
- This repo is educational/demonstrative.
End of PROJECT_SPEC_v3.md (TRD-GNN Temporal Extension).