Skip to content

DavidBellamy/DavidBellamy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

Python PyTorch CUDA SGLang vLLM RDMA Kubernetes Docker

Hi, I'm David Bellamy.

I am an Infrastructure Engineer for agentic training and a Harvard PhD Statistician.

I am currently at Institute of Foundation Models (ifm.ai) building the reinforcement learning infrastructure for agentic training of a frontier-scale model that the team pretrained and midtrained in-house.

My work spans the entire agentic RL training stack: per-rollout sandbox runtimes, the agent layer, the Rust inference request router, the inference engines, the trainer, the reward computation pool, and the orchestration control plane. Prominent themes of my recent work include cross-image NCCL weight transport between trainer and rollout engines, disaggregated prefill/decode reliability on multi-rail HGX fabrics, tokenizer-consistent training-on-rollouts (TITO), and implementing rollout routing replay (https://arxiv.org/abs/2510.11370) for large MoE RL training.


Selected recent contributions (since March 2026)

I work across the full agentic-RL stack: agent layer ↔ Rust inference request router ↔ inference engine ↔ sandbox runtime ↔ trainer. Recent contributions span every layer.

Inference-engine internals (vLLM / SGLang)

PR Stack Summary
vllm#38669 CUDA / PTX Fix Marlin MoE repack PTX incompatibility on H100/H200 under CUDA 12.8.
sglang#23003 RDMA / PD Per-GPU JSON mapping for --disaggregation-ib-device, enabling rail-aligned PD on HGX H100/H200.
LLM360/sglang#12 PyNCCL Opt-in PyNCCL transport for the weight-update side group; trainer and rollout containers can ship different libnccl versions. Key for facilitating weight updates between trainer engines and inference engines.
LLM360/sglang#14 InfiniBand verbs Serialize ibv_reg_mr to defend mooncake against an nvidia-peermem race that segfaults under concurrent GPU memory registration in SR-IOV VF environments.

Inference request router (Rust)

PR Summary
smg#1130 #[serde(flatten)] catch-all on ChatCompletionRequest so engine-specific JSON fields (SGLang's return_routed_experts etc.) survive gateway deserialization. Fixes upstream sglang issue #22740.
smg#1239 Mirrors flatten to six chat response structs so routed_experts and completion_token_ids round-trip end-to-end.
smg#1238 Strip content-length in preserve_response_headers. Catches a latent defensive bug in body-modification paths.

Distributed training reliability (Python trainer)

PR Summary
LLM360/miles#11 MoE routing-replay correctness: per-row complement padding avoids within-row expert-id duplicates.
miles#888 Restart-tolerant session proxy: router restarts (Ray failover, node loss) become transparent to active agents instead of cascading 404s.

Agentic RL system design (TITO)

TITO (token-in / token-out) keeps exact inference-engine token IDs end-to-end, eliminating tokenizer-drift bugs in training-on-rollouts. The plumbing spans all four open-source layers of the stack:

  • harbor#1454: agent-layer infrastructure (+379 lines).
  • LLM360/sglang#13 / #15: engine surfaces completion token IDs and tokenizer SHA256 on /get_model_info.
  • smg#1239: gateway preserves engine-specific response fields.
  • miles#1024: trainer-side TITO tokenizer supports agent-inserted assistant turns (e.g. terminus-2 / SWE-agent self-reflection).

Per-rollout sandbox runtime (closed-source)

Designed and operate the per-rollout sandbox layer: a Kubernetes cluster that spins up an isolated container for each agent rollout and persists tool/filesystem state across the rollout's lifetime. Scaled to 12,000 concurrent sandbox pods, which surfaced (and required solving) cluster-wide network bandwidth saturation and host-level file-descriptor exhaustion.


Earlier work

Project Description
grpo-gsm8k Bare-metal GRPO on GSM8k. Decoupled training (Torch) and inference (vLLM). 83.2% Pass@1, matching SFT baselines while recovering reasoning capabilities. (W&B report)
Labrador ML4H 2024 Best Paper. Empirical limits of masked LM pretraining on tabular EHR data.
suttonbarto Sutton & Barto exercises with rigorous derivations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors