Your personal AI companion on Telegram — powered by any LLM, fully self-hosted.
Quick Start · Features · Configuration · Development · Changelog
mai-gram turns any Telegram bot into a smart AI companion backed by 200+ LLM models via OpenRouter. Each user picks their own model, system prompt, and conversation style — the AI remembers facts through a built-in knowledge base (wiki), supports tool calling, and streams responses in real time. Deploy it on your own hardware and keep full control of your data.
| Any model, one interface | Switch between GPT-4o, Claude, Gemini, Llama, and 200+ other models without leaving Telegram. |
| Persistent memory | The AI builds a wiki of facts about each user — names, preferences, conversation history — and recalls them automatically. |
| Multi-bot, multi-user | Run 20+ bots from a single process, each with its own user whitelist, model restrictions, and prompt templates. |
| Conversation import | Migrate existing chats from other AI tools — upload a JSON file via Telegram or CLI and the full history is replayed into the bot. |
| Per-prompt configuration | Each system prompt template controls which tools are available, whether reasoning is visible, and how timestamps are displayed. |
| Self-hosted & private | All data stays on your machine. No cloud dependencies beyond the LLM API itself. |
| Structured output | Response template plugins (XML, JSON, markdown) constrain LLM output into validated structured formats — reasoning is preserved across turns instead of being silently truncated. |
| High-quality codebase | 717 tests across 4 tiers (unit → integration → functional → live), 92%+ coverage enforced on every commit, mypy strict, ruff, code-size audit, Docker, full CLI. |
git clone https://github.com/mikhailsal/mai-gram.git
cd mai-gram
make install-dev
cp .env.example .env
# Edit .env: set TELEGRAM_BOT_TOKEN and OPENROUTER_API_KEY
make runOpen your Telegram bot, send /start, pick a model and a system prompt — you're chatting.
For development with auto-reload:
make run-devResponses stream token-by-token with real-time message editing in Telegram — just like ChatGPT, but in your messenger. Long responses automatically overflow into multiple messages with progressive streaming across message boundaries. Tool calls are displayed as separate blockquote messages for transparency. LaTeX symbols, headers, and nested lists are rendered with proper Unicode formatting.
The AI automatically saves important facts about each user into a personal wiki. Entries are stored as plain .md files on disk (the source of truth) and indexed in SQLite for fast querying. You can browse, search, or edit wiki files manually — changes are picked up automatically.
data/<chat_id>/wiki/
├── 9999_human_name.md
├── 7000_profession.md
├── 5000_favorite_topics.md
└── 3000_recent_context.md
Define all your bots in a single config/bots.toml file. Each bot can have its own:
- User whitelist — who can use it
- Model restrictions — which LLMs are available
- Prompt restrictions — which system prompts can be selected
[[bots]]
token = "123456:ABC-DEF..."
allowed_users = [111111111]
# Let this user type arbitrary model ids + params (see "Arbitrary models" below)
custom_model_allowed_users = [111111111]
[[bots]]
token = "789012:GHI-JKL..."
allowed_users = [222222222]
allowed_models = ["google/gemini-2.5-flash"]
allowed_prompts = ["default", "coder"]Users listed in a bot's custom_model_allowed_users get a Custom model (type
your own) option during /start and /model. They can pick any OpenRouter
model id that isn't in config/models.toml and supply request parameters in a
chat message — the first line is the model id, each following key = value line
becomes a request parameter (dotted keys nest, values are type-coerced):
openai/gpt-5.4-mini
reasoning.effort = "high"
temperature = 0.7
provider.order = ["OpenAI"]
The capability is opt-in per bot and per user; if custom_model_allowed_users
is omitted, the option is hidden for everyone on that bot. From the CLI:
mai-chat -c test-demo --start --custom-model openai/gpt-5.4-mini \
--custom-model-params reasoning.effort=high temperature=0.7 --prompt defaultEach system prompt template is paired with a TOML config file that controls:
- Tool visibility — enable/disable specific tools or MCP servers per prompt
- Display defaults — show or hide reasoning and tool calls
- Datetime behavior — whether messages include timestamps
This means a "creative writing" prompt can hide technical tool calls, while a "coder" prompt shows everything for full transparency.
Most LLM providers strip reasoning tokens from the conversation history before sending it back to the model. If you rely on native reasoning output, the model loses its chain of thought on every subsequent turn. This was verified empirically across multiple providers and models in ai-thought-preserved-bench.
Response templates solve this by moving reasoning into regular message content via a structured format. Each template injects format instructions into the system prompt, parses the response fields (e.g. <thought> and <content> in XML), validates them — retrying if malformed — and renders each field separately in Telegram. Because the reasoning is stored as content, not as a provider-specific metadata field, it survives round-trips and stays visible to the model on the next turn.
Built-in templates:
- Empty — no-op passthrough (default for backward compatibility)
- XML —
<thought>/<content>tags with regex extraction - JSON —
{"thought", "content"}object with lenient parsing - Markdown Headers —
## Thought/## Contentsection splitting - XML with Emotions — extends XML with an
<emotions>field
Templates are selected during /start onboarding (or via --template in CLI). Each template supports user-configurable parameters — field names, paragraph counts, emotion counts — that adapt instructions, examples, parsing, and validation dynamically. Use /toggle <field> to hide or show individual fields per chat.
Per-bot template filtering is supported via allowed_templates in bots.toml.
Migrate your chat history from other AI tools:
- Via Telegram: Send
/import, pick a model, upload a JSON file — messages are replayed into the chat with formatting and rate limiting - Via CLI:
mai-chat -c test-demo --import-json path/to/messages.json
Supports OpenAI chat format and AI Proxy v2 request JSON.
A full command-line interface for testing and debugging without Telegram:
# Create a new chat
mai-chat -c test-demo --start --model openai/gpt-4o-mini --prompt default
# Chat
mai-chat -c test-demo "What is 2+2?"
# Slash commands
mai-chat -c test-demo --command help
mai-chat -c test-demo --command "timezone Europe/Moscow"
# Inspect
mai-chat -c test-demo --history
mai-chat -c test-demo --wiki
mai-chat -c test-demo --show-prompt
# Repair wiki index
mai-chat -c test-demo --repair-wikiSee docs/DEBUGGING.md for the full CLI reference.
| Variable | Required | Default | Description |
|---|---|---|---|
TELEGRAM_BOT_TOKEN |
Legacy | - | Primary bot token (ignored if bots.toml exists) |
OPENROUTER_API_KEY |
Yes | - | OpenRouter API key |
OPENROUTER_BASE_URL |
No | https://openrouter.ai/api/v1 |
API base URL (for proxy) |
LLM_MODEL |
No | openai/gpt-4o-mini |
Default model |
DATABASE_URL |
No | sqlite+aiosqlite:///./data/mai_gram.db |
Database URL |
ALLOWED_USERS |
No | - | Comma-separated Telegram user IDs |
DEBUG |
No | false |
Enable debug mode |
See docs/CONFIGURATION.md for the full reference including MEMORY_DATA_DIR, WIKI_CONTEXT_LIMIT, SHORT_TERM_LIMIT, LLM timeout settings, and data directory layout.
Each [models."<key>"] section defines an available model. Models are enabled by default; set enabled = false to hide one without removing its config.
[models]
default = "openai/gpt-4o-mini"
[models."openai/gpt-4o"]
title = "GPT-4o"
[models."openai/gpt-4o-mini"]
title = "GPT-4o Mini"
temperature = 0.7
[models."google/gemini-2.5-flash"]
title = "Gemini 2.5 Flash"
reasoning.effort = "medium"
provider.order = ["Google AI Studio"]
# Same model with different parameters via alias
[models."gemini-flash-creative"]
id = "google/gemini-2.5-flash"
title = "Gemini Flash (creative)"
temperature = 1.5Changes are hot-reloaded — no restart required.
Place .txt files in the prompts/ directory. Users see these as options during /start alongside "Custom (type your own)". Each prompt can have a companion .toml config for tool filtering and display settings.
Telegram User ──▶ Telegram Bot(s) ──▶ mai-gram ──▶ OpenRouter ──▶ LLM
│
┌──────────────┼──────────────┐
│ │ │
SQLite DB Wiki .md files MCP tools
(messages, (source of (wiki, messages,
chat config, truth for external)
wiki index) knowledge)
Wiki entries live as markdown files on disk (data/<chat_id>/wiki/*.md) and are indexed in SQLite for fast querying. The files are the source of truth — the database index is automatically rebuilt from disk on every message and can be manually repaired with mai-chat --repair-wiki.
All AI tools are implemented as Model Context Protocol servers — the open standard for LLM tool interfaces. This means:
- Wiki MCP server — the AI manages its own knowledge base by calling
wiki_create,wiki_edit,wiki_search,wiki_listtools - Messages MCP server — lets the AI search and reference past conversation history
- External MCP server — plug in any third-party MCP-compatible tool server via config. The bot reads the same
~/.cursor/mcp.jsonformat that Cursor uses, so any MCP server you've already set up for Cursor is one line away from being available to the bot too:
# config/models.toml
[mcp]
mcp_config_path = "~/.cursor/mcp.json"
external_servers = ["exa"] # pick which servers to expose to the AIUsing MCP as the internal tool protocol means the codebase is compatible with the broader AI tooling ecosystem out of the box, and adding new capabilities follows a well-defined interface rather than custom glue code.
See docs/DEVELOPMENT.md for the full architecture and docs/DEBUGGING.md for troubleshooting.
make install-dev # Install with dev dependencies + git hooks
make test # Run all tests
make check # Lint + format + typecheck + size audit
make precommit # Full pre-commit quality gateRun make help to see all available targets (run, chat CLI, docker, quality gates).
The test suite is organized into four tiers, ordered from fastest to slowest:
| Tier | Count | What it tests |
|---|---|---|
| Unit | majority | Pure logic, isolated modules with mocks |
| Integration | — | Multi-module workflows with a stub LLM provider, real in-process DB |
| Functional (local) | — | Black-box CLI subprocess tests — spawns the real mai-chat binary, no API key needed |
| Functional (live) | — | End-to-end tests hitting the real OpenRouter API |
The black-box functional tier deserves special mention: tests spawn mai-chat as an isolated subprocess and exercise the full stack through its CLI surface — no mocking of internals. This catches integration issues that unit tests miss and documents expected CLI behavior as executable specs.
Parallel execution via pytest-xdist with work-stealing runs unit and functional tiers concurrently (~55s vs ~170s serial). Integration tests run serially in a separate step to avoid global-state conflicts.
The pre-commit hook runs all four tiers in order — fast failures are caught before expensive live API calls.
- Ruff — linting and formatting with an extensive rule set (
E, F, I, N, W, UP, B, A, SIM, TCH, C4, PTH, DTZ, Sand more) - mypy strict — full strict type checking, no implicit
Any, no untyped defs - Code-size audit — automated check enforces a 500-line file and 60-line function limit to keep refactoring pressure visible
- 92%+ coverage — enforced by
pyproject.tomlas the single source of truth, checked on every commit - Pre-commit hooks — run the full quality gate automatically, including live functional tests when
OPENROUTER_API_KEYis available
See docs/DEVELOPMENT.md for setup, project structure, and the full testing guide.
MIT