Welcome to MySpinBot, the open-source, containerized platform for generating short personalized AI videos of talking “bots” — the kind that spin their way through voice cloning, LoRA training, and lip-syncing, all while making an effort not to melt your GPU.
Think of it as a creative factory where:
- You upload one or more images and a short audio voice clip along with some reference text,
- You prompt a local llm for creating a proper narrative and even a scene description,
- Then it creates the character and a setting1 for a portrait image, trains a voice clone and optionally a mini-LoRA1,
- And finally it stitches everything together by generating a staged, narrated, lip-synced video
All this locally, privately, and under your control. Here's a sample output.
I took this on, after watching Computerphile’s MikeBot3000: Can We Build an AI Mike from Open Source Tools? some months ago and used all the help I could get out of my trusty old pals ChatGPT-5 and Gemini 2.5 to make it spin — figuratively, literally, and sometimes uncontrollably.
My motivation was primarily educational but also I needed a facility in my home lab to act as an AI workbench. That is to enable me do things like: examine the inner workings of complex AI pipelines at the code level, testing new models that keeps popping up, especially in the ComfyUI ecosystem and in the domain of Computer Vision, and also play around with local agents. At the same time I also wanted the ability to quickly produce end-to-end prototypes (that is why I developed the Next.js/REACT frontend layer of the platform).
Hence, I developed a fully local platform which brings many open source components together into a unified AI infrastructure that one can freely use without the need to resolve to paid services. Extra details along with a more thoroughly-written (that is AI-generated2 😎) project description can be found here.
If you start digging into the project, you will find that this platform is not just a mere video generation automation, but a reference AI infrastructure deployment and a foundation that enables things like:
- Defining/orchestrating local agents
- Analysing and optimizing existing models and pipelines
- Prototyping/researching models and workflows
- and much more
For example one of my focus points for building this, was to put in place all required facilities so that I can develop and test CUDA/PyCUDA optimizations over existing PyTorch workflows in end-to-end use-cases.
MySpinBot's Dual-plane orchestration layer makes it a platform which supports distributed processing, which in combination to the use of LangGraph and Common Data Schemas is able to manage, publish and execute heterogeneous workflows, i.e. workflows which contain tasks to be executed by Node.js and Python processes (targeting both CPU an GPU execution). At the same time it provides the means to (own-managed or external) llm agents, to plan and compile such workflows, by having its backend components advertise their capabilities via well defined services manifests3.
“Deep breath...”
The system architecture features a React UI and a Node.js orchestration layer, which manages LangGraph distributed workflows across both Node.js and Python runtimes. The backend integrates specialized AI facilities, including ComfyUI diffusion pipelines and Ollama-hosted local LLMs. High-intensity machine learning tasks—such as LoRA training, Text-to-Speech (TTS), and lip-syncing—are handled by Python-based workers managed via Dramatiq4. To ensure production-grade reliability, the entire ecosystem is supported by a full observability suite using Prometheus and Grafana. To ensure robustness the orchestration layer is build on-top of Redis and is backed by a PostGreSQL persistence layer. The latter can act also as a vector database for facilitating components like RAG5. Furthermore, to manage input/output/staged data artifacts the system contains its own MinIO object storage, and finally, it all comes together via a Traefix proxy which manages routing and publishes underlying infrastructure's and custom backends services endpoints.
The full set of project documents lives in /docs — it explains everything from why this madness exists to how it will be contained.
/docs directory also functions as a repository for providing context to coding agents. As better described here the project is being developed with the use of VS Code and Dev Containers Extension. The latter provides proper sandboxing for one to deploy coding agents and let them go nuts if one likes. The idea is that initialy the agent will get in character by examining the root-level GEMINI.md which will trigger a context bootstrap process as the one described here, to make it go through the documentation in the proper order and fill its context window. Provisions are also been made for orchestrating different agents from their dev containers, centrally by a master-architect agent6. At this point agents in separate sandboxes are linked via a shared directory, enabling cross-container communication and coordination.
For the human reader though, it is best to start from documentation index or project description.
There are basically four things you need to take care in order to get this beast up and running. First you need to make sure your system covers the prerequisites. Then, you clone the repo locally. After that, you need to execute the provision script according to your platform (Linux, Windows). Finally, you must use compose to start the infrastructure. Thats it!
-
Prerequisites
- Docker with Compose
- A beefy system with at least 64 GB RAM and lots of cores for the containers.
- An Nvidia GPU with at least 16 GB of VRAM.
- NVIDIA drivers + NVIDIA Container Toolkit
- Local DNS or hostname setup so
*.myspinbot.localresolves to your Docker host
For reference the project has been developed and tested on a system with:
- Windows 11
- Docker Desktop >v4.55
- 96 GB RAM
- RTX 5070 Ti 16 GB VRAM
- Core Ultra 9 285K (24 Cores)
💡 If you are using Docker Desktop, make sure to provide it enough of your system resources. My
.wslconfiglooks like this:[wsl2] memory=80GB processors=16 swap=16GB -
Clone this repository
git clone https://github.com/ktzevani/myspinbot.git -
Run provisioning script
cdinto the newly downloaded workspace and based on you OS run one of the below replacing the passed parameters with the ones you desire.Linux:
AUTH_USER=myuser AUTH_PASS=SuperSecret DOMAIN=myspinbot.local DB_NAME=myspinbot ./scripts/provision_secrets.sh
Windows (Powershell):
$env:AUTH_USER = "myuser" $env:AUTH_PASS = "SuperSecret" $env:DOMAIN = "myspinbot.local" $env:DB_NAME = "myspinbot" .\scripts\provision_secrets.ps1
More information is found here
-
Start MySpinBot infrastructure (Production)
docker compose -f docker-compose.yml up -d
🎉 Your local MySpinBot instance should start spinning!
❗ Upon the first invocation of compose, docker images need to be pulled locally and also MySpinBot-specific custom images (backend, frontend, worker, sidecars etc.) need to be built locally. This will take some time, especially for the comfyui/worker custom image. So you need to be patient...
❗ Upon the initialization of the infrastructure and after the docker images are pulled/built in local registry, the downloader sidecar will kick in and try to fetch more than 60GB worth of data. That is all required models. So you need to be very patient...
❗ During the first invocation of the video generation pipeline, things might be a bit slower than usual due to the fact that additional downloads are going to occur.
❗ As all internal dashboards are accessed through local subdomains, you must configure your DNS resolution process (e.g.
/etc/hostson Linux/macOS,C:\Windows\System32\drivers\etc\hostson Windows, or if you are running your custom DNS server you must configure it accordingly) so that these hostnames resolve to the proper IP of the docker host running the MySpinBot stack — typically127.0.0.1when running locally.Example of hosts file in Windows:
127.0.0.1 api.myspinbot.local 127.0.0.1 proxy.myspinbot.local 127.0.0.1 grafana.myspinbot.local 127.0.0.1 prometheus.myspinbot.local 127.0.0.1 ui.myspinbot.local 127.0.0.1 redis.myspinbot.local 127.0.0.1 s3.myspinbot.local 127.0.0.1 pgadmin.myspinbot.local 127.0.0.1 openwebui.myspinbot.local 127.0.0.1 comfyui.myspinbot.localWithout proper hostname resolution, Traefik routing and TLS certificate validation will fail.
When the production infrastructure goes up you will have access to all included facilities. If you kept DOMAIN parameter at the provisioning stage equal to myspinbot.local the following links should work out of the box (you need to use the configured credentials):
- Frontend: https://ui.myspinbot.local
- API: https://api.myspinbot.local
- Health endpoint: https://api.myspinbot.local/health
- Get-Capabilities endpoint: https://api.myspinbot.local/api/capabilities
- Traefix Dashboard: https://proxy.myspinbot.local/
- Grafana: https://grafana.myspinbot.local
- Prometheus: https://prometheus.myspinbot.local
- Redis Insight: https://redis.myspinbot.local
- MinIO Panel/Console: https://s3.myspinbot.local
- pgAdmin Panel (PostgreSQL): https://pgadmin.myspinbot.local
- ComfyUI Workspace: https://comfyui.myspinbot.local
- OpenWebUI Portal7: https://openwebui.myspinbot.local
| Phase | Title | Status |
|---|---|---|
| 🧊 Subzero | Repo setup + docs freeze | ✅ Completed |
| 0 | Infra Bootstrap (Traefik + Monitoring) | ✅ Completed |
| 1 | Backend & Frontend Scaffold | ✅ Completed |
| 2 | Worker Integration & Dual-plane Orchestration | ✅ Completed |
| 3 | AI Pipeline Implementation | ✅ Completed |
| 4 | Quality & Observability | 🕓 Pending |
| 5 | Polish & Docs | ⏳ Planned |
See
history.mdfor how the architecture evolved across development phases.
- Local-First. No data leaves your machine.
- Metrics-First. Every container speaks Prometheus.
- Open-Source. Every dependency is transparent and replaceable.
- GPU-Aware. Jobs are serialized to protect your VRAM’s dignity.
- Humor-Tolerant. Because debugging diffusion pipelines without sarcasm is impossible.
Contributions, ideas, and sarcastic bug reports are welcome.
Open an issue, start a discussion, or send a pull request.
Be kind — we’re all just trying to make our bots talk before the GPU fans hit Mach 3.
This project is licensed under the AGPL-3.0 License — see the LICENSE file for details.
Phase 4 is on its way, brace yourself: the spin just went interactive. May your API routes be fast, your Redis never block, and your GPU stay smugly at 42 °C.
Footnotes
-
Still in TODOs. Current version supports uploading an existing portrait image instead of prompting for character creation or LoRA training. ↩ ↩2
-
At this point, I'd like to stress that... I genuinely, with my own eyes and keyboard, reviewed and edited... most of the documentation/code that the AI produced. ↩
-
Some more work to be done there, foundational components are all laid out though and are functional. ↩
-
Dramatiq to be introduced, right now Python tasks are managed by the running process and not by actors. ↩
-
Current implementation doesn't employ the use of Retrieval-Augmented Generation because the needs on llm context are minimal in the planned pipelines. ↩
-
I didn't set it in motion yet though in my workstation, because I'm still lacking of a properly configured local own-managed MCP server. ↩

