I am an Infrastructure Engineer for agentic training and a Harvard PhD Statistician.
I am currently at Institute of Foundation Models (ifm.ai) building the reinforcement learning infrastructure for agentic training of a frontier-scale model that the team pretrained and midtrained in-house.
My work spans the entire agentic RL training stack: per-rollout sandbox runtimes, the agent layer, the Rust inference request router, the inference engines, the trainer, the reward computation pool, and the orchestration control plane. Prominent themes of my recent work include cross-image NCCL weight transport between trainer and rollout engines, disaggregated prefill/decode reliability on multi-rail HGX fabrics, tokenizer-consistent training-on-rollouts (TITO), and implementing rollout routing replay (https://arxiv.org/abs/2510.11370) for large MoE RL training.
I work across the full agentic-RL stack: agent layer ↔ Rust inference request router ↔ inference engine ↔ sandbox runtime ↔ trainer. Recent contributions span every layer.
Inference-engine internals (vLLM / SGLang)
| PR | Stack | Summary |
|---|---|---|
| vllm#38669 | CUDA / PTX | Fix Marlin MoE repack PTX incompatibility on H100/H200 under CUDA 12.8. |
| sglang#23003 | RDMA / PD | Per-GPU JSON mapping for --disaggregation-ib-device, enabling rail-aligned PD on HGX H100/H200. |
| LLM360/sglang#12 | PyNCCL | Opt-in PyNCCL transport for the weight-update side group; trainer and rollout containers can ship different libnccl versions. Key for facilitating weight updates between trainer engines and inference engines. |
| LLM360/sglang#14 | InfiniBand verbs | Serialize ibv_reg_mr to defend mooncake against an nvidia-peermem race that segfaults under concurrent GPU memory registration in SR-IOV VF environments. |
Inference request router (Rust)
| PR | Summary |
|---|---|
| smg#1130 | #[serde(flatten)] catch-all on ChatCompletionRequest so engine-specific JSON fields (SGLang's return_routed_experts etc.) survive gateway deserialization. Fixes upstream sglang issue #22740. |
| smg#1239 | Mirrors flatten to six chat response structs so routed_experts and completion_token_ids round-trip end-to-end. |
| smg#1238 | Strip content-length in preserve_response_headers. Catches a latent defensive bug in body-modification paths. |
Distributed training reliability (Python trainer)
| PR | Summary |
|---|---|
| LLM360/miles#11 | MoE routing-replay correctness: per-row complement padding avoids within-row expert-id duplicates. |
| miles#888 | Restart-tolerant session proxy: router restarts (Ray failover, node loss) become transparent to active agents instead of cascading 404s. |
Agentic RL system design (TITO)
TITO (token-in / token-out) keeps exact inference-engine token IDs end-to-end, eliminating tokenizer-drift bugs in training-on-rollouts. The plumbing spans all four open-source layers of the stack:
- harbor#1454: agent-layer infrastructure (+379 lines).
- LLM360/sglang#13 / #15: engine surfaces completion token IDs and tokenizer SHA256 on
/get_model_info. - smg#1239: gateway preserves engine-specific response fields.
- miles#1024: trainer-side TITO tokenizer supports agent-inserted assistant turns (e.g. terminus-2 / SWE-agent self-reflection).
Per-rollout sandbox runtime (closed-source)
Designed and operate the per-rollout sandbox layer: a Kubernetes cluster that spins up an isolated container for each agent rollout and persists tool/filesystem state across the rollout's lifetime. Scaled to 12,000 concurrent sandbox pods, which surfaced (and required solving) cluster-wide network bandwidth saturation and host-level file-descriptor exhaustion.
| Project | Description |
|---|---|
| grpo-gsm8k | Bare-metal GRPO on GSM8k. Decoupled training (Torch) and inference (vLLM). 83.2% Pass@1, matching SFT baselines while recovering reasoning capabilities. (W&B report) |
| Labrador | ML4H 2024 Best Paper. Empirical limits of masked LM pretraining on tabular EHR data. |
| suttonbarto | Sutton & Barto exercises with rigorous derivations. |