✦ ✦ ✦ ✦ ✦ ✦ ✦
┌─┐┬─┐┌─┐┬┬
│ ┬├┬┘├─┤││
└─┘┴└─┴ ┴┴┴─┘
✦ ✦ ✦ ✦ ✦ ✦ ✦
Documentation: Miner • Validator • FAQ
grail delivers post-training for language models with cryptographically verifiable inference. It implements the GRAIL protocol (Guaranteed Rollout Authenticity via Inference Ledger) so that rollouts produced during RL are tied to a specific model and input, and can be independently verified by validators.
- grail (lowercase): The Bittensor subnet implementation orchestrating miners, validators, and a trainer for verifiable post-training
- GRAIL (uppercase): The protocol that proves rollout authenticity and model identity
- Miners generate rollouts with GRAIL proofs across multiple environments (currently Triton Kernel)
- Validators verify proofs, evaluate kernel correctness on-GPU, and set weights on-chain
- The trainer runs GRPO-based reinforcement learning on validated rollouts, publishing updated checkpoints each window
- Model checkpoints are shared via R2 storage and automatically loaded by miners and validators
Prover/Verifier implementation with:
- PRF-based index derivation and sketch commitments for token-level verification
- Verifier-supplied challenge (drand + chain/window context)
- Token and model-config validation; structured signatures bound to model identity
GRPO-style rollout system with:
- Multiple rollouts per problem (16 per group), token-level logprob tracking
- 3-GPU pipeline mode: vLLM generation, HuggingFace proof computation, and kernel evaluation in parallel
- Shared
forward_single_layerfunction ensuring bit-identical results between miner and validator
Modular environments with a single active environment set network-wide:
- Triton Kernel (
gpu_kernel/) — current default: GPU kernel generation and on-GPU correctness evaluation using Triton - 3-SAT (
sat.py): Deterministic 3-SAT constraint satisfaction problems - GSM8K (
gsm8k_env.py): Math word problems with step-by-step reasoning verification - MATH (
math_hendrycks_env.py): Competition-level math from the Hendrycks MATH dataset - MBPP (
python_code_env.py): Python code generation from the MBPP benchmark - HumanEval (
python_code_env.py): Function-level code generation from OpenAI HumanEval - Affine Trace/Logic (
affinetes/): Affine type system trace and logic environments
Asynchronous GRPO trainer with:
- Per-window training on validated rollouts fetched from R2
- Delta checkpoint publishing (~99% bandwidth reduction vs full checkpoints)
- Adaptive KL, importance sampling, and chunked logit computation for memory efficiency
Object-storage utilities for miner/validator/trainer coordination:
- Upload mined rollouts, publish validated rollouts, checkpoint management via R2
- Randomness (
grail/infrastructure/drand.py): Robust drand v2-first client with fallbacks and a mock beacon for testing - Chain & credentials (
grail/infrastructure/chain.py): Manages R2 credential commitments and metagraph access
Typer-based CLI with subcommands: mine, validate, train.
- Problem Generation: The active environment generates problems using public randomness derived from drand and the window's block hash
- Rollout Collection: Miners generate 16 GRPO rollouts per problem, tracking token ids and logprobs for proof construction
- GRAIL Verification: Validators verify tokens, the GRAIL commitment/opening against the claimed model, and environment-specific evaluation (e.g., kernel correctness for Triton Kernel)
- Reward & Weights: Validators score miners based on unique valid rollouts with a superlinear curve (
SUPERLINEAR_EXPONENT = 4.0), then normalize and set weights on-chain - Model Updates: The trainer collects validated rollouts, runs GRPO training, and publishes updated model checkpoints to R2 each window
The GRAIL protocol ensures:
- Deterministic, publicly auditable challenges (drand + chain context)
- Model-binding proof of token processing; no substitution or replay
- Environment-agnostic verification: the protocol works across all supported environments
- PRIME_Q: 2,147,483,647 (mod prime for sketches)
- CHALLENGE_K: 32 (minimum challenged positions)
- PROOF_BATCH_SIZE: 16 (fixed constant for miner/validator numerical consistency)
- WINDOW_LENGTH: 30 blocks per scoring window
- ROLLOUTS_PER_PROBLEM: 16
- Triton Kernel (current default): GPU kernel generation — the model writes Triton kernels evaluated for correctness on a dedicated GPU
- 3-SAT: Variables 3–10, Clauses 5–20, Clause length 3; deterministic from seed
- GSM8K: Math word problems from the GSM8K dataset with step-by-step reasoning verification
- MATH: Competition-level mathematics from the Hendrycks MATH dataset
- MBPP: Python code generation from the Mostly Basic Python Problems benchmark
- HumanEval: Function-level code generation from the OpenAI HumanEval benchmark
- Affine Trace/Logic: Affine type system environments for trace and logic reasoning
- Hugging Face Transformers compatible, exposes token ids/logprobs
- Text-only environments (SAT, GSM8K, MATH, MBPP, HumanEval): 1 GPU minimum; any CUDA-capable accelerator
- Triton Kernel environment: 3 GPUs recommended — one for model inference (decoding), one for proof/logprob computation, and one for kernel evaluation. The kernel evaluation GPU should be A100 or H100 class to support Triton JIT compilation
For detailed hardware specifications, see compute.min.yaml.
For detailed setup instructions, please refer to the appropriate documentation:
See Miner Documentation for comprehensive setup instructions including:
- Hardware and environment requirements
- Wallet and network configuration
- R2/S3 credentials setup
- Pipeline mode configuration (3-GPU)
- Running the miner
See Validator Documentation for comprehensive setup instructions including:
- Hardware and environment requirements
- Docker Compose or native deployment
- Wallet and network configuration
- Running the validator
# Clone and install
git clone https://github.com/one-covenant/grail
cd grail
uv venv && source .venv/bin/activate
uv sync
# Configure environment
cp .env.example .env
# Edit .env with your wallet names, network, and R2 credentials
# Run miner
grail mine
# Run validator
grail validateImportant Notes:
- Randomness is fetched from drand; miners mix it with the window's block hash
- Rollouts are uploaded to object storage (R2/S3); validators fetch, verify, score, and set weights
- Model checkpoints evolve through training and are automatically loaded each window
- For monitoring:
- Miners and validators can log detailed metrics to the public W&B project: https://wandb.ai/tplr/grail
- Real-time system logs and network statistics are available at the Grafana dashboard: https://grail-grafana.tplr.ai/
- Verifiable Training: Cryptographic binding of rollouts to model and input
- Decentralized Post-Training: Internet-scale contribution and evaluation
- Environment Agnostic: Modular framework supports multiple problem domains
- Incentive Aligned: On-chain weights reward sustained, verifiable improvements
We welcome contributions to:
- New environments and reward vectors
- Protocol robustness and verification
- Performance and throughput improvements
- Documentation and examples