Implements:
- Tensor Parallelism across 2 GPUs
- Pipeline Parallelism across 2 stages
- gRPC message passing
- Prometheus metrics (latency, GPU utilization)
- Grafana dashboard
python tensor_parallel/shard_model.py
python pipeline_parallel/pipe_runner.py
localhost:9090 (Prometheus) localhost:3000 (Grafana)