Skip to content

suga-ucsd/distributed-inference-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed LLM Inference Pipeline

Implements:

  • Tensor Parallelism across 2 GPUs
  • Pipeline Parallelism across 2 stages
  • gRPC message passing
  • Prometheus metrics (latency, GPU utilization)
  • Grafana dashboard

Run Tensor Parallel

python tensor_parallel/shard_model.py

Run Pipeline Parallel

python pipeline_parallel/pipe_runner.py

Monitor

localhost:9090 (Prometheus) localhost:3000 (Grafana)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors