SmolVLM — Easy-to-Run Vision-Language Model 🖼️💬

This project packages SmolVLM2 into a ready-to-use Docker image with both a web UI and an HTTP API for image–text-to-text tasks.

It’s designed for fast setup and offline-friendly deployment so you can:

Pull the container
Run it with one command
Start sending images and prompts instantly

⚠️ Note: Maintained by an independent developer for ease-of-use.
Not production-hardened — you may need extra work for critical deployments.

✅ Offline Mode: Supported — models can be pre-baked into the image or mounted from local storage.
✅ Minimal GPU Requirements: Works on most modern NVIDIA GPUs with at least 4GB VRAM and CUDA support (Ampere or newer recommended). Lower VRAM cards may require using the 256M model variant.

🤝 Contributions Welcome: Bug fixes, feature requests, and PRs are appreciated.

🆘 Support & Issues

If you encounter problems or have suggestions:

Open a GitHub Issue with details, logs, and reproduction steps
Use Discussions for general questions

🚀 Quick Start

Standard run

docker run --gpus all -p 8888:8888 sensejworld/smolvlm:latest

MultiGPU setup

docker run --gpus all -e CUDA_VISIBLE_DEVICES=1 -p 8888:8888 sensejworld/smolvlm:latest

Then open: http://localhost:8888/ui

🌐 API Usage Example

curl -X POST "http://localhost:8888/ptt/convert" \
  -H "Accept: application/json" \
  -F "query=What is in this image?" \
  -F "image=@cat.jpg;type=image/jpeg"

Or without an image (uses demo image):

curl -X POST "http://localhost:8888/ptt/convert" \
  -F "query=Count cats in the image?"

📦 Docker Features

Pinned dependencies for reproducibility
GPU acceleration (CUDA)
Preloaded model option for full offline usage
Single container with UI + API
Clean multi-stage build

🐳 Docker Hub

Find available images on Docker Hub — useful for:

Selecting a specific version
Checking the latest builds before pulling
Matching CUDA/PyTorch versions to your needs

📜 Version History

v0.0.2 (Planned)

Support picture Batch
Support video

v0.0.1

Initial release
Basic SmolVLM2 inference via API & UI
Demo image support
Docker GPU support
Added FastAPI-based /ptt/convert endpoint
Integrated Gradio chat UI with image upload
Shared inference worker between API and UI
Offline model prefetch script (init_downloads.py) Run with:

docker run --gpus all -p 8888:8888 sensejworld/smolvlm:v0.0.1

🛠 Developer Notes

See notes.md for:

Local Docker build/run instructions
Conda setup for local dev
FlashAttention verification commands
API & UI testing commands

📜 License

This project is licensed under the MIT License.
Models by HuggingFaceTB — check their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
app		app
docs		docs
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmolVLM — Easy-to-Run Vision-Language Model 🖼️💬

🆘 Support & Issues

🚀 Quick Start

Standard run

MultiGPU setup

🌐 API Usage Example

📦 Docker Features

🐳 Docker Hub

📜 Version History

v0.0.2 (Planned)

v0.0.1

🛠 Developer Notes

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmolVLM — Easy-to-Run Vision-Language Model 🖼️💬

🆘 Support & Issues

🚀 Quick Start

Standard run

MultiGPU setup

🌐 API Usage Example

📦 Docker Features

🐳 Docker Hub

📜 Version History

v0.0.2 (Planned)

v0.0.1

🛠 Developer Notes

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages