David DavidBellamy

Hi, I'm David Bellamy.

I am an Infrastructure Engineer for agentic training and a Harvard PhD Statistician.

I am currently at Institute of Foundation Models (ifm.ai) building the reinforcement learning infrastructure for agentic training of a frontier-scale model that the team pretrained and midtrained in-house.

My work spans the entire agentic RL training stack: per-rollout sandbox runtimes, the agent layer, the Rust inference request router, the inference engines, the trainer, the reward computation pool, and the orchestration control plane. Prominent themes of my recent work include cross-image NCCL weight transport between trainer and rollout engines, disaggregated prefill/decode reliability on multi-rail HGX fabrics, tokenizer-consistent training-on-rollouts (TITO), and implementing rollout routing replay (https://arxiv.org/abs/2510.11370) for large MoE RL training.

Selected recent contributions (since March 2026)

I work across the full agentic-RL stack: agent layer ↔ Rust inference request router ↔ inference engine ↔ sandbox runtime ↔ trainer. Recent contributions span every layer.

Inference-engine internals (vLLM / SGLang)

PR	Stack	Summary
vllm#38669	CUDA / PTX	Fix Marlin MoE repack PTX incompatibility on H100/H200 under CUDA 12.8.
sglang#23003	RDMA / PD	Per-GPU JSON mapping for `--disaggregation-ib-device`, enabling rail-aligned PD on HGX H100/H200.
LLM360/sglang#12	PyNCCL	Opt-in PyNCCL transport for the weight-update side group; trainer and rollout containers can ship different `libnccl` versions. Key for facilitating weight updates between trainer engines and inference engines.
LLM360/sglang#14	InfiniBand verbs	Serialize `ibv_reg_mr` to defend mooncake against an `nvidia-peermem` race that segfaults under concurrent GPU memory registration in SR-IOV VF environments.

Inference request router (Rust)

PR	Summary
smg#1130	`#[serde(flatten)]` catch-all on `ChatCompletionRequest` so engine-specific JSON fields (SGLang's `return_routed_experts` etc.) survive gateway deserialization. Fixes upstream sglang issue #22740.
smg#1239	Mirrors flatten to six chat response structs so `routed_experts` and `completion_token_ids` round-trip end-to-end.
smg#1238	Strip `content-length` in `preserve_response_headers`. Catches a latent defensive bug in body-modification paths.

Distributed training reliability (Python trainer)

PR	Summary
LLM360/miles#11	MoE routing-replay correctness: per-row complement padding avoids within-row expert-id duplicates.
miles#888	Restart-tolerant session proxy: router restarts (Ray failover, node loss) become transparent to active agents instead of cascading 404s.

Agentic RL system design (TITO)

TITO (token-in / token-out) keeps exact inference-engine token IDs end-to-end, eliminating tokenizer-drift bugs in training-on-rollouts. The plumbing spans all four open-source layers of the stack:

harbor#1454: agent-layer infrastructure (+379 lines).
LLM360/sglang#13 / #15: engine surfaces completion token IDs and tokenizer SHA256 on /get_model_info.
smg#1239: gateway preserves engine-specific response fields.
miles#1024: trainer-side TITO tokenizer supports agent-inserted assistant turns (e.g. terminus-2 / SWE-agent self-reflection).

Per-rollout sandbox runtime (closed-source)

Designed and operate the per-rollout sandbox layer: a Kubernetes cluster that spins up an isolated container for each agent rollout and persists tool/filesystem state across the rollout's lifetime. Scaled to 12,000 concurrent sandbox pods, which surfaced (and required solving) cluster-wide network bandwidth saturation and host-level file-descriptor exhaustion.

Earlier work

Project	Description
grpo-gsm8k	Bare-metal GRPO on GSM8k. Decoupled training (Torch) and inference (vLLM). 83.2% Pass@1, matching SFT baselines while recovering reasoning capabilities. (W&B report)
Labrador	ML4H 2024 Best Paper. Empirical limits of masked LM pretraining on tabular EHR data.
suttonbarto	Sutton & Barto exercises with rigorous derivations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David DavidBellamy

Achievements

Achievements

Highlights

Block or report DavidBellamy

Hi, I'm David Bellamy.

Selected recent contributions (since March 2026)

Earlier work

Pinned Loading

Uh oh!