Skip to content
View DavidBellamy's full-sized avatar

Highlights

  • Pro

Block or report DavidBellamy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DavidBellamy/README.md

Python PyTorch CUDA SGLang vLLM RDMA Kubernetes Docker

Hi, I'm David Bellamy.

I am an Infrastructure Engineer for agentic training and a Harvard PhD Statistician.

I am currently at Institute of Foundation Models (ifm.ai) building the reinforcement learning infrastructure for agentic training of a frontier-scale model that the team pretrained and midtrained in-house.

My work spans the entire agentic RL training stack: per-rollout sandbox runtimes, the agent layer, the Rust inference request router, the inference engines, the trainer, the reward computation pool, and the orchestration control plane. Prominent themes of my recent work include cross-image NCCL weight transport between trainer and rollout engines, disaggregated prefill/decode reliability on multi-rail HGX fabrics, tokenizer-consistent training-on-rollouts (TITO), and implementing rollout routing replay (https://arxiv.org/abs/2510.11370) for large MoE RL training.


Selected recent contributions (since March 2026)

I work across the full agentic-RL stack: agent layer ↔ Rust inference request router ↔ inference engine ↔ sandbox runtime ↔ trainer. Recent contributions span every layer.

Inference-engine internals (vLLM / SGLang)

PR Stack Summary
vllm#38669 CUDA / PTX Fix Marlin MoE repack PTX incompatibility on H100/H200 under CUDA 12.8.
sglang#23003 RDMA / PD Per-GPU JSON mapping for --disaggregation-ib-device, enabling rail-aligned PD on HGX H100/H200.
LLM360/sglang#12 PyNCCL Opt-in PyNCCL transport for the weight-update side group; trainer and rollout containers can ship different libnccl versions. Key for facilitating weight updates between trainer engines and inference engines.
LLM360/sglang#14 InfiniBand verbs Serialize ibv_reg_mr to defend mooncake against an nvidia-peermem race that segfaults under concurrent GPU memory registration in SR-IOV VF environments.

Inference request router (Rust)

PR Summary
smg#1130 #[serde(flatten)] catch-all on ChatCompletionRequest so engine-specific JSON fields (SGLang's return_routed_experts etc.) survive gateway deserialization. Fixes upstream sglang issue #22740.
smg#1239 Mirrors flatten to six chat response structs so routed_experts and completion_token_ids round-trip end-to-end.
smg#1238 Strip content-length in preserve_response_headers. Catches a latent defensive bug in body-modification paths.

Distributed training reliability (Python trainer)

PR Summary
LLM360/miles#11 MoE routing-replay correctness: per-row complement padding avoids within-row expert-id duplicates.
miles#888 Restart-tolerant session proxy: router restarts (Ray failover, node loss) become transparent to active agents instead of cascading 404s.

Agentic RL system design (TITO)

TITO (token-in / token-out) keeps exact inference-engine token IDs end-to-end, eliminating tokenizer-drift bugs in training-on-rollouts. The plumbing spans all four open-source layers of the stack:

  • harbor#1454: agent-layer infrastructure (+379 lines).
  • LLM360/sglang#13 / #15: engine surfaces completion token IDs and tokenizer SHA256 on /get_model_info.
  • smg#1239: gateway preserves engine-specific response fields.
  • miles#1024: trainer-side TITO tokenizer supports agent-inserted assistant turns (e.g. terminus-2 / SWE-agent self-reflection).

Per-rollout sandbox runtime (closed-source)

Designed and operate the per-rollout sandbox layer: a Kubernetes cluster that spins up an isolated container for each agent rollout and persists tool/filesystem state across the rollout's lifetime. Scaled to 12,000 concurrent sandbox pods, which surfaced (and required solving) cluster-wide network bandwidth saturation and host-level file-descriptor exhaustion.


Earlier work

Project Description
grpo-gsm8k Bare-metal GRPO on GSM8k. Decoupled training (Torch) and inference (vLLM). 83.2% Pass@1, matching SFT baselines while recovering reasoning capabilities. (W&B report)
Labrador ML4H 2024 Best Paper. Empirical limits of masked LM pretraining on tabular EHR data.
suttonbarto Sutton & Barto exercises with rigorous derivations.

Pinned Loading

  1. grpo-gsm8k grpo-gsm8k Public

    RL post-training open LLMs for math reasoning

    Python 3

  2. suttonbarto suttonbarto Public

    Solutions to the exercises in Sutton & Barto's textbook Reinforcement Learning: An Introduction

    Python

  3. labrador labrador Public

    Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data.

    Python 14 4

  4. beamlab-hsph/Neural-Moment-Matching-Regression beamlab-hsph/Neural-Moment-Matching-Regression Public

    Code for our NeurIPS 2022 work titled "Deep Learning Methods for Proximal Inference via Maximum Moment Restriction"

    Python 5 5