Skip to content

aalok-p/watcher

Repository files navigation

Watcher

Real-time AI GPU health coach. Watches your GPU metrics, diagnoses bottlenecks, and explains fixes in plain English.

one command setup

# Clone the repo 
git clone <repo-url>
cd watcher

# add nvidia container toolkit (execute one setup-nvidia-gpu.sh file) - skip if already have
bash setup-nvidia-gpu.sh

# Start with docker-compose
docker-compose up --build

Then open: http://localhost:8000

Watcher (Video Demo) - https://youtu.be/G8i196ag9CY?si=JXK_Bs2pw1fh0TzJ

things to add -

  • read nvidia-smi
  • rule based diagnosis
  • llm based diagnosis & reasoning
  • add vsison sdk to monitor
  • monitor via prometheus
  • make cli version

About

watches the gpu to avoid bottlneck issues

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors