AstroML is a research-driven Python framework for building dynamic graph machine learning models on the Stellar Development Foundation Stellar blockchain.
It treats blockchain data as a multi-asset, time-evolving graph, enabling advanced ML research on transaction networks such as fraud detection, anomaly detection, and behavioral modeling.
AstroML provides end-to-end tooling for:
- Ledger ingestion and normalization
- Dynamic transaction graph construction
- Feature engineering for blockchain accounts
- Graph Neural Networks (GNNs)
- Self-supervised node embeddings
- Anomaly detection
- Temporal modeling
- Reproducible ML experimentation
Blockchain networks are naturally graph-structured systems:
| Blockchain Concept | Graph Representation |
|---|---|
| Accounts | Nodes |
| Transactions | Directed edges |
| Assets | Edge types |
| Time | Dynamic dimension |
Most analytics tools rely on static heuristics or SQL queries.
AstroML instead enables:
- Dynamic graph learning
- Temporal GNNs
- Representation learning
- Research-grade experimentation
AstroML is designed for:
- ML researchers
- Graph ML engineers
- Fraud detection teams
- Blockchain data scientists
┌─────────────────────────────────────────────────────────────────────────┐
│ AstroML: Ingestion → Graph → Train │
└─────────────────────────────────────────────────────────────────────────┘
┌──────────────┐
│ Stellar │
│ Ledgers │
└──────┬───────┘
│
┌────────────────▼────────────────┐
│ 1. INGESTION LAYER │
│ ├─ Ledger backfill (Polars) │
│ ├─ Incremental ingestion │
│ ├─ State tracking (idempotent)│
│ └─ PostgreSQL storage │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ 2. NORMALIZATION LAYER │
│ ├─ Raw Stellar schema │
│ │ (Ledger, Transaction, Op) │
│ ├─ Graph mirror layer │
│ │ (GraphAccount, GraphEdge) │
│ └─ Composite indexes │
│ (account_id, timestamp) │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ 3. GRAPH BUILDING LAYER │
│ ├─ Time-windowed snapshots │
│ ├─ Edge construction │
│ ├─ Node indexing │
│ └─ Graph validation │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ 4. FEATURE ENGINEERING │
│ ├─ Transaction frequency │
│ ├─ Asset diversity │
│ ├─ Structural importance │
│ │ (degree, betweenness, PR) │
│ ├─ Feature store & versioning │
│ └─ Point-in-time queries │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ 5. TRAINING LAYER │
│ ├─ Temporal train/test split │
│ ├─ Link prediction task │
│ ├─ Negative sampling │
│ ├─ PyTorch Geometric models │
│ │ (GCN, GraphSAGE, GAT) │
│ └─ Early stopping │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ 6. BENCHMARKING & EVALUATION │
│ ├─ Reproducible configs │
│ ├─ Random seed tracking │
│ ├─ Metric computation │
│ │ (AUC, Precision, Recall) │
│ ├─ Memory profiling │
│ └─ Result persistence │
└────────────────┬────────────────┘
│
┌──────▼──────┐
│ Baseline │
│ Results │
└─────────────┘
Stellar Ledger Data
↓
[Ingestion Service]
├─ Fetch ledgers (1000000-1100000)
├─ Track state (.astroml_state/ingestion_state.json)
└─ Store in PostgreSQL
↓
[Database Schema]
├─ Raw Layer: Ledger, Transaction, Operation, Account, Asset
├─ Graph Layer: GraphAccount, GraphEdge, GraphTransactionDetail
└─ Indexes: (account_id, timestamp) composite
↓
[Graph Snapshot]
├─ Query operations by time window
├─ Create Edge objects (src, dst, timestamp, asset, amount)
├─ Build node_index mapping
└─ Validate graph (isolated nodes, self-loops, density)
↓
[Feature Store]
├─ Compute node features (frequency, diversity, centrality)
├─ Compute edge features (asset type, amount, direction)
├─ Version features with metadata
└─ Store in SQLite + Parquet
↓
[Temporal Split]
├─ Sort edges by timestamp
├─ Split at cutoff (80% train, 20% test)
└─ Ensure no future data leaks into training
↓
[Link Prediction Task]
├─ Context window: edges before cutoff
├─ Future window: edges after cutoff
├─ Positive labels: future edges
├─ Negative sampling: random non-edges
└─ Binary classification objective
↓
[Model Training]
├─ LinkPredictor(encoder + decoder)
├─ Adam optimizer with early stopping
├─ Compute AUC, Precision, Recall, F1
└─ Track training/validation losses
↓
[Benchmark Results]
├─ config.json (full configuration + seed)
├─ result.json (metrics + performance)
└─ metadata.json (run_id, timestamp, linking files)
astroml/
├── ingestion/ # Ledger ingestion & state tracking
│ ├── service.py # IngestionService (incremental, idempotent)
│ ├── state.py # StateStore (tracks processed ledgers)
│ └── backfill.py # Bulk ledger loading
├── db/ # Database layer
│ ├── schema.py # SQLAlchemy ORM models
│ └── session.py # Database connection management
├── features/ # Feature engineering
│ ├── feature_store.py # Enterprise feature management
│ ├── graph/
│ │ └── snapshot.py # Time-windowed graph construction
│ ├── frequency.py # Transaction frequency features
│ ├── asset_diversity.py
│ └── gnn/ # Graph neural network layers
├── models/ # ML models
│ ├── link_predictor.py
│ ├── gcn.py
│ ├── sage.py
│ └── deep_svdd.py
├── tasks/ # Training tasks
│ └── link_prediction_task.py
├── training/ # Training utilities
│ ├── temporal_split.py # Prevent data leakage
│ └── train_link_prediction.py
├── benchmarking/ # Benchmarking framework
│ ├── core.py # ModelBenchmark orchestrator
│ ├── config.py # Configuration management
│ └── metrics.py # Metric computation
├── quick_start.py # Quick start pipeline
└── cli.py # Command-line interface
# Run quick start with default settings (100 ledgers, 50 accounts, 10 epochs)
make quickstart
# Run with more data for thorough testing
make quickstart-verbose# Run quick start with default settings
python -m astroml.quick_start
# Run with custom parameters
python -m astroml.quick_start --num-ledgers 200 --num-accounts 100 --epochs 20 --seed 42# Run quick start command
python -m astroml quickstart --num-ledgers 100 --num-accounts 50 --epochs 10 --seed 42The quick start pipeline:
- Generates sample data: Creates 100 synthetic ledgers with 50 accounts and realistic transactions
- Builds transaction graph: Constructs a time-windowed graph with ~2000 edges
- Validates graph: Checks for isolated nodes, self-loops, and computes statistics
- Trains baseline model: Trains a LinkPredictor model for 10 epochs
- Saves reproducible results: Stores config, results, and metadata for reproducibility
Output:
benchmark_results/quickstart/
├── config.json # Full configuration with random seed
├── result.json # Training metrics and performance
└── metadata.json # Run metadata linking config and result
Example output:
================================================================================
AstroML Quick Start: Ingestion → Graph → Train Pipeline
================================================================================
[Step 1/5] Generating sample ledger data...
Generated 100 ledgers with 50 accounts
[Step 2/5] Building transaction graph...
Built graph with 2000 edges and 50 nodes
[Step 3/5] Creating benchmark configuration...
[Step 4/5] Training baseline model...
Epoch 0: Train Loss = 0.6931, Val Loss = 0.6892
Epoch 5: Train Loss = 0.4521, Val Loss = 0.4612
Training complete. Best metrics: {'auc': 0.92, 'precision': 0.88, 'recall': 0.85}
[Step 5/5] Saving benchmark results...
Saved config to benchmark_results/quickstart/config.json
Saved result to benchmark_results/quickstart/result.json
Saved metadata to benchmark_results/quickstart/metadata.json
✓ Quick start completed successfully!
Results saved to: benchmark_results/quickstart
================================================================================
git clone https://github.com/Traqora/astroml.git
cd astromlpython -m venv venv
source venv/bin/activate
pip install -r requirements.txtA lightweight Docker Compose setup is provided to spin up PostgreSQL and Redis with persistent volumes. Simply run:
docker compose up -dThis starts only the database and cache, letting you run Python scripts and training natively on your machine. Alternatively, you can configure your own database and update config/database.yaml.
Backfill ledgers:
python -m astroml.ingestion.backfill \
--start-ledger 1000000 \
--end-ledger 1100000Create a rolling time window graph:
python -m astroml.graph.build_snapshot --window 30dCreate benchmark datasets by injecting controlled fraud structures into a clean ledger copy:
python -m astroml.ingestion.synthetic_fraud_injector \
--input data/clean_ledger.jsonl \
--output data/ledger_with_fraud.jsonl \
--summary outputs/fraud_injection_summary.json \
--sybil-clusters 3 \
--sybil-cluster-size 8 \
--wash-loops 2 \
--wash-loop-size 5The injector appends transactions tagged with synthetic_fraud=true and fraud_pattern (sybil_cluster or wash_trading_loop) for downstream benchmarking.
python -m astroml.training.train_gcn- Liquidity Monitoring for the Stellar Community Fund
- Fraud / scam detection
- Account clustering
- Transaction risk scoring
- Temporal behavior modeling
- Self-supervised embeddings
- Network anomaly detection
AstroML emphasizes:
- Reproducibility
- Modular experimentation
- Scalable ingestion
- Temporal graph learning
- Production-ready ML pipelines
- Python
- PyTorch / PyTorch Geometric
- PostgreSQL
- NetworkX / graph tooling
- Real-time streaming ingestion
- Temporal GNN models
- Contrastive learning pipelines
- Feature store
- Model benchmarking suite
- Docker deployment
Contributions are welcome!
fork → branch → commit → PRPlease open issues for bugs or feature requests.
MIT License