This project packages SmolVLM2 into a ready-to-use Docker image with both a web UI and an HTTP API for image–text-to-text tasks.
It’s designed for fast setup and offline-friendly deployment so you can:
- Pull the container
- Run it with one command
- Start sending images and prompts instantly
Not production-hardened — you may need extra work for critical deployments.
✅ Offline Mode: Supported — models can be pre-baked into the image or mounted from local storage.
✅ Minimal GPU Requirements: Works on most modern NVIDIA GPUs with at least 4GB VRAM and CUDA support (Ampere or newer recommended). Lower VRAM cards may require using the 256M model variant.
🤝 Contributions Welcome: Bug fixes, feature requests, and PRs are appreciated.
If you encounter problems or have suggestions:
- Open a GitHub Issue with details, logs, and reproduction steps
- Use Discussions for general questions
docker run --gpus all -p 8888:8888 sensejworld/smolvlm:latestdocker run --gpus all -e CUDA_VISIBLE_DEVICES=1 -p 8888:8888 sensejworld/smolvlm:latestThen open: http://localhost:8888/ui
curl -X POST "http://localhost:8888/ptt/convert" \
-H "Accept: application/json" \
-F "query=What is in this image?" \
-F "image=@cat.jpg;type=image/jpeg"Or without an image (uses demo image):
curl -X POST "http://localhost:8888/ptt/convert" \
-F "query=Count cats in the image?"- Pinned dependencies for reproducibility
- GPU acceleration (CUDA)
- Preloaded model option for full offline usage
- Single container with UI + API
- Clean multi-stage build
Find available images on Docker Hub — useful for:
- Selecting a specific version
- Checking the latest builds before pulling
- Matching CUDA/PyTorch versions to your needs
- Support picture Batch
- Support video
- Initial release
- Basic SmolVLM2 inference via API & UI
- Demo image support
- Docker GPU support
- Added FastAPI-based
/ptt/convertendpoint - Integrated Gradio chat UI with image upload
- Shared inference worker between API and UI
- Offline model prefetch script (
init_downloads.py) Run with:
docker run --gpus all -p 8888:8888 sensejworld/smolvlm:v0.0.1See notes.md for:
- Local Docker build/run instructions
- Conda setup for local dev
- FlashAttention verification commands
- API & UI testing commands
This project is licensed under the MIT License.
Models by HuggingFaceTB — check their respective licenses.
