Releases: TARS-AI-Community/TARS-AI
OS Amelia
OS Version Amelia — Major Milestone
The initial OS Amelia is a large architectural update bringing Pi layer compatibility, face recognition groundwork, app system, and screensaver restructuring.
New Features
Skills Plugin System
Replaced the monolithic ~530-line if-elif dispatch chain in module_llm.py with an auto-discovered plugin architecture.
- Each LLM tool is a standalone
skill_*.pyfile insrc/skills/ - Skills are auto-discovered at startup — drop a file in, restart, it works
- Each skill exports a
SKILLdict (name + LLM prompt definition + optional examples) and anexecute()function - Skill prompt definitions and few-shot examples are dynamically injected into the system prompt
- 19 skills: web_search, capture_camera_view, take_photo, execute_movement, generate_image, adjust_persona, identify_speaker_name, volume, browser, home_assistant, system_control, launch_retropie, reminder, discord, sandbox_exec, network_camera, generate_music, tars_radio, example
module_engine.py(FUNCTION_REGISTRY) removed entirely- Multi-skill fix: when multiple skills return results in one LLM turn, replies are joined instead of overwriting
Streaming LLM + Sentence TTS Pipeline
TARS now speaks after the first sentence instead of waiting for the full LLM reply.
SentenceTTSPipelinesplits streaming LLM tokens at sentence boundaries (min 20 chars, min clause 12 chars)- Voice mode and web UI share the same pipeline backend
- Device speakers and web browser play in parallel with LLM generation
- Real-time text tokens appear in the web UI as they stream
- Follow-up message fix: web search and other tools that modify the reply now emit and speak the updated text correctly
Speaker ID Rewrite
Complete rewrite of voice identification.
- Three-tier threshold model: consistency (0.55), soft match (0.65), high-confidence (0.75)
- Atomic saves: writes temp file then renames to prevent corruption
- Auto-merge: duplicate unknown profiles are merged automatically
- Real names in memory: recalled memories and prompts now show actual speaker names instead of
unknown - wait_for_identification(): LLM processing waits up to 2.0s for speaker ID to resolve before displaying the user message
- Deferred name correction: speaker name on UI message bubble updates via socket event after identification completes
- Barge-in fix: pre-roll audio chunks excluded from speaker ID embeddings to prevent TARS's own TTS voice from poisoning speaker detection
- Browser audio: browser mic audio submitted to speaker ID for web UI voice mode
- Default threshold raised from 0.5 → 0.75
Dashboard Tab
New dashboard tab in the web UI with 5 sub-tabs:
- Graph — D3.js memory/knowledge graph with detail drawer
- Mood — Emotion bar chart, timeline, activity heatmap
- Log — Interaction audit trail
- Topics — Topic extraction table and network graph
- Prompt — Full prompt debugger with expand/collapse
Backed by module_dashboard_data.py: logs every interaction with timestamp, emotion, speaker, and LLM prompt. Tracks emotional state with half-life decay for the radar chart.
Heartbeat System & Reminder Skill
New module_heartbeat.py provides a lightweight scheduler for one-shot and recurring timed tasks.
- New
skill_reminder.pylets TARS set reminders via voice/text ("remind me in 10 minutes to…") - Task persistence: reminders survive restarts
Identity Coordinator (module_identity.py)
Fuses voice ID and face recognition into a unified identity context.
- Voice is authoritative (who is speaking)
- Falls back to face recognition if exactly one known person is visible
- Provides unified
get_identity_context()for prompt injection module_speaker_id.pyand face recognition feed into it
Atomik Wake Word — CNN/ONNX + Multi-Gate Pipeline
Complete rewrite of the wake word system from simple template matching to a dual-mode architecture.
Template mode (cosine similarity — preferred):
- Records 5 template utterances of the wake word during setup
- Augments templates with time-stretch, pitch-shift, gain variation, noise addition
- MFCC cosine similarity matching against recorded templates
- Most reliable for custom wake words out of the box
Model mode (CNN/ONNX neural network):
- Pre-trained ONNX model from
tts/{wake_word}.onnx TinyClassifier— pure-numpy 1D CNN with Adam optimizer, trained on-device as fallback- 5-gate pre-filter pipeline: energy, adaptive SNR, crest factor, spectral speech band, low-frequency rejection
- Confirmation window (2 above-threshold scores in 5 checks)
- Post-detection
_is_speech_like()validation (ZCR, energy dynamics, syllable counting) - Speed-variation retry for near-threshold scores
- MFCC + delta + delta-delta features (39 channels) for ONNX models
- Synthetic negative audio generation for training (40 clips: white/pink noise, babble, tonal, rumble, clicks)
Template mode (cosine similarity — preferred) remains as the recommended default for custom wake words, with the best out-of-box reliability.
Config: atomik_mode = auto | model | template
Wake Word Gates
Post-detection gate system applied to all wake word backends (atomik, sherpa-onnx, fastrtc).
Three independent gates configured in [STT]:
- Speaker verification (
vad_speaker_verify) — Extracts voice embedding from wake audio, matches against enrolled speakers. Modes:off,any(any enrolled), or specific speaker name. Uses lower threshold (0.60) than normal speaker ID (0.75). - Presence gate (
vad_presence_gate) — Requires face on camera via IdentityManager. Modes:off,any(any face),known(enrolled face only). - Transcript verification (
vad_transcript_verify) — Transcribes wake audio via best available local STT, confirms wake word was actually spoken with exact + fuzzy matching.
All gates are optional and disabled by default. They run synchronously after initial detection and before triggering the wake word callback.
New Infrastructure Modules
Four new modules extracted from growing codebase:
module_mic.py(~561 lines) — Shared microphone utilities. Detects native sample rate, provides resampling helpers,_AudioHubsingleton for shared mic access across consumers.module_router.py(~186 lines) — Message routing. Tracks where user is interacting (voice/webui/Discord), delivers responses to correct target.module_state.py(~92 lines) — Thread-safe application state machine.TarsStateenum (BOOTING, STANDBY, LISTENING, THINKING, TALKING) with listener callbacks.- Audio output autodetection (
module_tts.py) — Automatically discovers and selects the correct audio output device at startup. Prioritizes USB audio > hardware devices > virtual ALSA devices. Skips HDMI outputs that would route audio to a monitor instead of the speaker. Used globally by TTS, indicator beeps, and radio streaming.
Raspberry Pi Zero 2 Support
First release with full Pi Zero 2 W support — a low-cost ($15) entry point for TARS-AI.
- New
PIZERO2device profile with tailored capability gating - Cloud-only backends: OpenAI STT, cloud TTS (ElevenLabs, OpenAI), cloud LLM
- Lite keyword-based memory (no embeddings, no vector search)
- Minimal dependency footprint (~300MB vs ~2.5GB on Pi5)
- Auto-detected via
/proc/device-tree/modelat runtime - Pi version stored in
src/memory/pi_versionfor subsequent installs - Graceful degradation: features that require more compute are automatically disabled
New Skills
Six new skill plugins added:
skill_discord.py(~248 lines) — Discord bot integration, replaces oldmodule_discord.pyskill_sandbox_exec.py(~314 lines) — Sandboxed code executionskill_network_camera.py(~436 lines) — Network camera accessskill_generate_music.py(~411 lines) — Music generationskill_tars_radio.py(~1,029 lines) — TARS radio functionalityskill_example.py(~110 lines) — Example/template skill for reference
Modules Inlined or Removed
Several standalone modules were merged into skills or removed:
| Module | Status |
|---|---|
module_engine.py |
Deleted (was already dead code) |
module_secrets.py |
Deleted |
module_browser.py |
Inlined into skill_browser.py |
module_volume.py |
Inlined into skill_volume.py |
module_discord.py |
Replaced by skill_discord.py |
module_homeassistant.py |
Inlined into skill_home_assistant.py |
Web UI: Browser Voice Mode
Full voice chat in the browser:
- Web Audio API captures microphone at 16kHz
- Audio sent as base64 via SocketIO
browser_audioevent - Local sherpa-onnx transcription with OpenAI Whisper fallback
- RMS silence rejection filters empty audio
- Bot replies streamed as tokens (
bot_token) + audio chunks (bot_audio_chunk) ChunkedAudioPlayerhandles queue-based playback with barge-in
Web UI: Themes
15 CSS themes, auto-discovered from src/www/static/css/themes/:
arctic, chatgpt, claude, cyberpunk, default, gemini, gi_joe, glados, grogu, grok, hal9000, linux, microsoft, retro, starwars
Selectable via [ACCESS] webui_theme in config.ini.
Web UI: Login Authentication
- Password-protected web interface
- Session-based auth with
flask.session - Random initial password generated on first install
- Dedicated
login.htmlpage - Security fix: Flask secret key is now randomized (was hardcoded, allowing auth bypass)
- Security fix: emotions endpoint XSS-hardened
Web UI: New Tabs and Controls
- Body tab — Combined arm/leg servo sliders
- WiFi tab — Network scanning, connect/disconnect, hotspot toggle (TARS-Setup AP)
- **SYS (Nex...
4.0
🚀 RELEASE v4.0: TARS V2, Hardware Overhaul & Smarter AI Features!
The TARS-AI 4.0 release introduces a major leap forward with TARS V2 hardware, improved AI responsiveness, advanced battery monitoring, enhanced UI controls, and numerous 3D design refinements. This is the most feature-complete TARS-AI version to date, blending faster neural processing with smarter integrations and a fully reworked mechanical design.
🔥 Main Highlights
TARS V2 Hardware Upgrade
New main leg mounts and mechanical refinements for better stability.
Static TARS case option (no motors) and updated design without heat inserts.
Fixed lid clearance issues with the camera and speaker mounts.
Hybrid RAG & Neural Net Enhancements
Significantly faster Hybrid RAG response times.
Integrated neural net support for smarter motion control and decision-making.
UI & Web Interface Upgrades
New motion control tab in ChatUI for seamless real-time control.
Refined UI font sizes, fullscreen toggle, and better overall visual polish.
Battery & Power Monitoring
INA260-based battery support for real-time voltage and current tracking.
FPS variable for smoother display and control updates.
🔥 New Features & Upgrades
Azure Voice Modifications: Enhanced voice synthesis and flexibility.
OpenAI TTS Option: Additional TTS engine choice for customizable voices.
Config.ini Improvements: Toggle camera directly from configuration settings.
3D CAD Files: Added TARS CAD STEP file and corrections to printed parts.
⚙️ Technical & Performance Improvements
Fixed 90° and 270° rotation issues with the new layout logic.
Optimized engine and BT controller modules for smoother performance.
Numerous 3D print corrections and incremental V2 updates for reliability.
🔄 What’s Changed
Added image to README by @mossed1
Merged development into main branch by @pyrater
Improved Hybrid RAG response and Picovoice support by @latishab
UI improvements and motion control tab by @pyrater & @atomikspace
Font size fixes, fullscreen toggle, and camera config toggle by @atomikspace
Battery monitoring via INA260 by @atomikspace
TARS V2 hardware updates and multiple 3D print corrections by @atomikspace
Azure voice modifications and OpenAI TTS option by @atomikspace
Fixed JSON output format for movement LLM calls by @Autosuffisant
Fixed lid clearance issues with camera and speaker mount by @mossed1
Updated engine and BT controller modules by @mossed1
🎉 New Contributors
@mossed1 made their first contribution
@Autosuffisant made their first contribution
Special thanks to @atomikspace as he as done an extreme amount of heavy lifting (most of the changes) to get the V2 rework out the door.
📜 Full Changelog: 3.0 → 4.0
👉 View All Commits
3.0
🚀 RELEASE v3.0: Faster, Smarter, and Now With a Fancy WebUI!
The TARS-AI 3.0 release brings a massive overhaul with improved speech capabilities, a sleek new chat WebUI, better memory handling, and countless refinements. This is the most powerful and user-configurable version of TARS-AI yet.
🔥 Main Highlights
Text-to-Speech (TTS) & Speech-to-Text (STT) Overhaul
Refactored TTS and STT for better efficiency and flexibility.
Added Eleven Labs TTS for high-quality voice synthesis.
Integrated Silero VAD (Voice Activity Detection) for smarter audio processing.
DEEPINFRA LLM option added.
TTS now streams per sentence in chunks, making responses significantly faster.
🔥 New Features & Upgrades
Chat WebUI: Fully interactive web-based chat interface for easier communication.
Expanded User Configuration: More settings added to config.ini for deeper customization.
Message Queue System: Terminal now processes messages more efficiently with queuing.
Memory / RAG Enhancements: Upgraded retrieval-augmented generation (RAG) from Latisha, making long-term recall more accurate and fluid.
Wiki Created: Comprehensive documentation for setup, features, and customization.
🔥 File & Structure Improvements
Reorganized and updated 3D files for better organization.
Updated Character Card: More options for defining character traits and personalities.
⚙️ Technical & Performance Improvements
Faster Response Time: TTS chunked streaming greatly reduces lag.
Bug Fixes Galore: Squashed countless bugs for a smoother experience.
Optimized Memory Handling: Smarter memory system improves both speed and accuracy.
Refactored Codebase: Cleaner, more maintainable structure across core modules.
🔄 What's Changed
Added Eleven Labs TTS by @latishab
Added DEEPINFRA LLM by @latishab
Integrated Silero VAD by @latishab
Implemented Chat WebUI by @pyrater
Refactored TTS & STT pipeline by @pyrater
Improved RAG Memory from Latisha by @latishab
Message Queue System for Terminal by @pyrater
Updated and reorganized 3D files by @pyrater
Created Wiki for Documentation by @latishab
Readme Updates by @mossed1
🎉 New Contributors
@pyrater made their first contribution
@xanomanox made their first contribution
@DaftNinja made their first contribution
@mspierg made their first contribution
@holmesha made their first contribution
@BiosUp made their first contribution
@latishab many updates merged by pyrater.
📜 Full Changelog: 2.0 → 3.0
👉 View All Commits
Get ready for the fastest, smartest, and most customizable version of TARS-AI yet! 🚀
RELEASE v2.0: Now with 100% more sarcasm and 0% chances of docking mishaps!
TARS-AI 2.0 Development Branch Update
The TARS-AI 2.0 development branch introduces several enhancements over the initial 1.0 release, including:
🚀 Core Additions
Speech-to-Text (STT) Enhancements:
- Integrated Whisper for local STT alongside Vosk and external server options.
- Added STT sensitivity configuration for improved accuracy.
- Comprehensive improvements to the STT system for better transcription.
- Significant speed improvements.
- Dynamic microphone quality selection based on hardware.
Character Customizations:
- Persistent personality settings (e.g., humor and tone) for dynamic character interaction.
- Expanded character settings and behavior customization options.
Function Calling:
- Dual methods for function calls: Naive Bayes (NB) and LLM-based approaches.
Voice-Controlled Movement:
- Enabled voice commands to control robotic movements with precise mapping.
Image Generation:
- Integrated DALL·E and Stable Diffusion for AI-powered image creation.
Volume Control:
- Fine-tuned volume adjustments through both voice commands and external configurations. Credit: @MSKULL
Home Assistant Integration:
- Seamless connection with smart home systems.
⚙️ Technical Improvements
- Reworked LLM function into its own module.
- Reworked LLM function to be compatible with thinking models (Deepseek R1).
- Reworked build prompt function for easy importing.
- Reworked memory module to ensure correct prompt and memory management.
- Reworked tokenization for proper counts.
- Renamed STL files for better readability.
- Many fixes and tweaks for better usability.
- Over 196 commits.
Override Encoding Model:
- Enhanced compatibility with models using
override_encoding_model.
TTS Fixes:
- Resolved issues with special characters in Text-to-Speech.
What's Changed
- Update config.ini.template by @DaftNinja in https://github.com/pyrater/TARS-AI/pull/3
- V9-misc-mods by @xanomanox in https://github.com/pyrater/TARS-AI/pull/4
- add M3 grub screws to fasteners by @xanomanox in https://github.com/pyrater/TARS-AI/pull/5
- updated parts list (in progress) by @xanomanox in https://github.com/pyrater/TARS-AI/pull/6
- Feature/volume by @mspierg in https://github.com/pyrater/TARS-AI/pull/8
- Update training_data.csv by @holmesha in https://github.com/pyrater/TARS-AI/pull/10
- Dev by @holmesha in https://github.com/pyrater/TARS-AI/pull/11
- added openai to requirements.txt by @BiosUp in https://github.com/pyrater/TARS-AI/pull/12
New Contributors
- @DaftNinja made their first contribution in https://github.com/pyrater/TARS-AI/pull/3
- @xanomanox made their first contribution in https://github.com/pyrater/TARS-AI/pull/4
- @mspierg made their first contribution in https://github.com/pyrater/TARS-AI/pull/8
- @holmesha made their first contribution in https://github.com/pyrater/TARS-AI/pull/10
- @BiosUp made their first contribution in https://github.com/pyrater/TARS-AI/pull/12
Full Changelog: pyrater/TARS-AI@1.0...2.0
RELEASE v1.0: Hello World, Hello TARS!
Version: TARS-AI 1.0
We are thrilled to announce the official release of TARS-AI 1.0, the first version of our highly anticipated intelligent assistant designed for local deployment, inspired by the iconic TARS robot from Interstellar. This marks a significant step toward integrating advanced AI capabilities into personalized, everyday tasks while maintaining privacy and security.
Key Features
Fully Localized AI
Runs entirely on local hardware, ensuring data privacy and independence from cloud services.
TARS Personality Integration
TARS-AI embodies the humor, utility, and wit of the original TARS character, making interactions not just efficient but also engaging.
Modular Design
Supports a variety of modules, including natural language understanding, task automation, and information retrieval, all customizable to your specific needs.
Real-Time Interaction
Provides dynamic, context-aware responses, allowing seamless conversational experiences.
Expandable Hardware Integration
Easily integrates with external sensors, actuators, and robotics systems, opening doors to home automation, robotics, and more.
Configurable AI Ethics and Humor Levels
Adjust personality traits like humor and honesty to tailor TARS-AI to your preferences.
Developer-Friendly Framework
Built with an open API and extensive documentation, enabling developers to add custom functionalities effortlessly.
What's Changed
- RELEASE v1.0: Hello World, Hello TARS! by @alexander-wang03 in pyrater#1
- ENVSETUP.md Azure TTS Instructions Update by @alexander-wang03 in https://github.com/pyrater/TARS-AI/pull/2
New Contributors
- @alexander-wang03 made their first contribution in pyrater#1
Full Changelog: https://github.com/pyrater/TARS-AI/commits/1.0