A local terminal AI assistant based on GGUF and llama‑cpp‑python models. No cloud. Complete privacy.
Link:https://krzyzyk33.github.io/VideoHub/VideoHub.html
The application runs completely offline, using only local GGUF models, ensuring complete privacy and no internet dependency. It allows for conversations in the terminal. It supports downloading models from the models file, automatic model loading, generating responses, and much more. It has a built-in lightweight HTTP server that provides an API for communicating with the model in other applications.
- Python 3.11+ required.
- Install a GGUF model and copy it to models/.
- Run the application in cmd
CMDAI launch Double-click on the file run.py
- Load the model:
/load - Talk to the model
/help- help/load- choose model from list/pause- enable or disable pause while talking to the model/unload- unload model from RAMstatus- app statusversion-llama-cpp-pythonversionupdate- runtime update
GET /tagsPOST /generatePOST /chatPOST /pullGET /version
CMDAI is a lightweight, transparent, and developer‑friendly tool for running local GGUF models without the overhead of large frameworks.
- Lightweight architecture — no containers, servers, or heavy dependencies.
- Terminal‑first workflow — clear, predictable behavior with full control.
- Fast model switching — load and unload models instantly without restarting.
- Built‑in API server — easy integration with apps, scripts, and automation.
- Fully offline — all processing happens locally for maximum privacy.
- Simpler than Ollama, lighter than LM Studio — minimal overhead, maximum flexibility.
- Developer‑oriented design — clean structure, CLI tools, API access, and an open roadmap.
Thank you for your interest in improving CMDAI! Here are simple ways you can support the project.
- Report bugs in the Issues tab
- Suggest new features
- Test the application on your system
- Share improvement ideas
- Add and test new GGUF models
- Go to the Issues tab
- Click New Issue
- Choose: Bug / Feature
- Describe the problem or suggestion clearly
- Go to Discussions.
- write your suggestions in the How To Help category.
Every contribution is appreciated!
models/and large local binaries are excluded from the repository by.gitignore.- This project is designed to run fully local models.
- The application can only work with the
models/file and therun.pyfile
This project was inspired by tools like Ollama, LM Studio and Claude Code, which showed how powerful and accessible local AI models can be.
I wanted to bring a similar experience directly into the command line — simple, fast, and fully local — without the need for heavy interfaces or external services.
CMD LOCAL AI is my attempt to create a clean, terminal‑native way to interact with AI models.
CMD LOCAL AI is a project being developed step by step. Below are the directions I plan to develop in future versions:
- Support for multiple models simultaneously - switching between local models without restarting the application.
- Performance profiling - real-time viewing of generation time and RAM and CPU usage.
- Developer Tools mode - viewing system prompts, tokens, and raw model responses.
- Integration with plugins/extensions - the ability to add custom commands and actions performed by the AI.
- Support for audio and vision models - speech and image support for compatible local models.