Releases · TARS-AI-Community/TARS-AI

23 Mar 13:14

1a00652

OS Amelia Latest

Latest

OS Version Amelia — Major Milestone

The initial OS Amelia is a large architectural update bringing Pi layer compatibility, face recognition groundwork, app system, and screensaver restructuring.

New Features

Skills Plugin System

Replaced the monolithic ~530-line if-elif dispatch chain in module_llm.py with an auto-discovered plugin architecture.

Each LLM tool is a standalone skill_*.py file in src/skills/
Skills are auto-discovered at startup — drop a file in, restart, it works
Each skill exports a SKILL dict (name + LLM prompt definition + optional examples) and an execute() function
Skill prompt definitions and few-shot examples are dynamically injected into the system prompt
19 skills: web_search, capture_camera_view, take_photo, execute_movement, generate_image, adjust_persona, identify_speaker_name, volume, browser, home_assistant, system_control, launch_retropie, reminder, discord, sandbox_exec, network_camera, generate_music, tars_radio, example
module_engine.py (FUNCTION_REGISTRY) removed entirely
Multi-skill fix: when multiple skills return results in one LLM turn, replies are joined instead of overwriting

Streaming LLM + Sentence TTS Pipeline

TARS now speaks after the first sentence instead of waiting for the full LLM reply.

SentenceTTSPipeline splits streaming LLM tokens at sentence boundaries (min 20 chars, min clause 12 chars)
Voice mode and web UI share the same pipeline backend
Device speakers and web browser play in parallel with LLM generation
Real-time text tokens appear in the web UI as they stream
Follow-up message fix: web search and other tools that modify the reply now emit and speak the updated text correctly

Speaker ID Rewrite

Complete rewrite of voice identification.

Three-tier threshold model: consistency (0.55), soft match (0.65), high-confidence (0.75)
Atomic saves: writes temp file then renames to prevent corruption
Auto-merge: duplicate unknown profiles are merged automatically
Real names in memory: recalled memories and prompts now show actual speaker names instead of unknown
wait_for_identification(): LLM processing waits up to 2.0s for speaker ID to resolve before displaying the user message
Deferred name correction: speaker name on UI message bubble updates via socket event after identification completes
Barge-in fix: pre-roll audio chunks excluded from speaker ID embeddings to prevent TARS's own TTS voice from poisoning speaker detection
Browser audio: browser mic audio submitted to speaker ID for web UI voice mode
Default threshold raised from 0.5 → 0.75

Dashboard Tab

New dashboard tab in the web UI with 5 sub-tabs:

Graph — D3.js memory/knowledge graph with detail drawer
Mood — Emotion bar chart, timeline, activity heatmap
Log — Interaction audit trail
Topics — Topic extraction table and network graph
Prompt — Full prompt debugger with expand/collapse

Backed by module_dashboard_data.py: logs every interaction with timestamp, emotion, speaker, and LLM prompt. Tracks emotional state with half-life decay for the radar chart.

Heartbeat System & Reminder Skill

New module_heartbeat.py provides a lightweight scheduler for one-shot and recurring timed tasks.

New skill_reminder.py lets TARS set reminders via voice/text ("remind me in 10 minutes to…")
Task persistence: reminders survive restarts

Identity Coordinator (`module_identity.py`)

Fuses voice ID and face recognition into a unified identity context.

Voice is authoritative (who is speaking)
Falls back to face recognition if exactly one known person is visible
Provides unified get_identity_context() for prompt injection
module_speaker_id.py and face recognition feed into it

Atomik Wake Word — CNN/ONNX + Multi-Gate Pipeline

Complete rewrite of the wake word system from simple template matching to a dual-mode architecture.

Template mode (cosine similarity — preferred):

Records 5 template utterances of the wake word during setup
Augments templates with time-stretch, pitch-shift, gain variation, noise addition
MFCC cosine similarity matching against recorded templates
Most reliable for custom wake words out of the box

Model mode (CNN/ONNX neural network):

Pre-trained ONNX model from tts/{wake_word}.onnx
TinyClassifier — pure-numpy 1D CNN with Adam optimizer, trained on-device as fallback
5-gate pre-filter pipeline: energy, adaptive SNR, crest factor, spectral speech band, low-frequency rejection
Confirmation window (2 above-threshold scores in 5 checks)
Post-detection _is_speech_like() validation (ZCR, energy dynamics, syllable counting)
Speed-variation retry for near-threshold scores
MFCC + delta + delta-delta features (39 channels) for ONNX models
Synthetic negative audio generation for training (40 clips: white/pink noise, babble, tonal, rumble, clicks)

Template mode (cosine similarity — preferred) remains as the recommended default for custom wake words, with the best out-of-box reliability.

Config: atomik_mode = auto | model | template

Wake Word Gates

Post-detection gate system applied to all wake word backends (atomik, sherpa-onnx, fastrtc).

Three independent gates configured in [STT]:

Speaker verification (vad_speaker_verify) — Extracts voice embedding from wake audio, matches against enrolled speakers. Modes: off, any (any enrolled), or specific speaker name. Uses lower threshold (0.60) than normal speaker ID (0.75).
Presence gate (vad_presence_gate) — Requires face on camera via IdentityManager. Modes: off, any (any face), known (enrolled face only).
Transcript verification (vad_transcript_verify) — Transcribes wake audio via best available local STT, confirms wake word was actually spoken with exact + fuzzy matching.

All gates are optional and disabled by default. They run synchronously after initial detection and before triggering the wake word callback.

New Infrastructure Modules

Four new modules extracted from growing codebase:

module_mic.py (~561 lines) — Shared microphone utilities. Detects native sample rate, provides resampling helpers, _AudioHub singleton for shared mic access across consumers.
module_router.py (~186 lines) — Message routing. Tracks where user is interacting (voice/webui/Discord), delivers responses to correct target.
module_state.py (~92 lines) — Thread-safe application state machine. TarsState enum (BOOTING, STANDBY, LISTENING, THINKING, TALKING) with listener callbacks.
Audio output autodetection (module_tts.py) — Automatically discovers and selects the correct audio output device at startup. Prioritizes USB audio > hardware devices > virtual ALSA devices. Skips HDMI outputs that would route audio to a monitor instead of the speaker. Used globally by TTS, indicator beeps, and radio streaming.

Raspberry Pi Zero 2 Support

First release with full Pi Zero 2 W support — a low-cost ($15) entry point for TARS-AI.

New PIZERO2 device profile with tailored capability gating
Cloud-only backends: OpenAI STT, cloud TTS (ElevenLabs, OpenAI), cloud LLM
Lite keyword-based memory (no embeddings, no vector search)
Minimal dependency footprint (~300MB vs ~2.5GB on Pi5)
Auto-detected via /proc/device-tree/model at runtime
Pi version stored in src/memory/pi_version for subsequent installs
Graceful degradation: features that require more compute are automatically disabled

New Skills

Six new skill plugins added:

skill_discord.py (~248 lines) — Discord bot integration, replaces old module_discord.py
skill_sandbox_exec.py (~314 lines) — Sandboxed code execution
skill_network_camera.py (~436 lines) — Network camera access
skill_generate_music.py (~411 lines) — Music generation
skill_tars_radio.py (~1,029 lines) — TARS radio functionality
skill_example.py (~110 lines) — Example/template skill for reference

Modules Inlined or Removed

Several standalone modules were merged into skills or removed:

Module	Status
`module_engine.py`	Deleted (was already dead code)
`module_secrets.py`	Deleted
`module_browser.py`	Inlined into `skill_browser.py`
`module_volume.py`	Inlined into `skill_volume.py`
`module_discord.py`	Replaced by `skill_discord.py`
`module_homeassistant.py`	Inlined into `skill_home_assistant.py`

Web UI: Browser Voice Mode

Full voice chat in the browser:

Web Audio API captures microphone at 16kHz
Audio sent as base64 via SocketIO browser_audio event
Local sherpa-onnx transcription with OpenAI Whisper fallback
RMS silence rejection filters empty audio
Bot replies streamed as tokens (bot_token) + audio chunks (bot_audio_chunk)
ChunkedAudioPlayer handles queue-based playback with barge-in

Web UI: Themes

15 CSS themes, auto-discovered from src/www/static/css/themes/:

arctic, chatgpt, claude, cyberpunk, default, gemini, gi_joe, glados, grogu, grok, hal9000, linux, microsoft, retro, starwars

Selectable via [ACCESS] webui_theme in config.ini.

Web UI: Login Authentication

Password-protected web interface
Session-based auth with flask.session
Random initial password generated on first install
Dedicated login.html page
Security fix: Flask secret key is now randomized (was hardcoded, allowing auth bypass)
Security fix: emotions endpoint XSS-hardened

Web UI: New Tabs and Controls

Body tab — Combined arm/leg servo sliders
WiFi tab — Network scanning, connect/disconnect, hotspot toggle (TARS-Setup AP)
**SYS (Nex...

Assets 2

02 Aug 15:44

pyrater

4.0

74b51c0

4.0

🚀 RELEASE v4.0: TARS V2, Hardware Overhaul & Smarter AI Features!
The TARS-AI 4.0 release introduces a major leap forward with TARS V2 hardware, improved AI responsiveness, advanced battery monitoring, enhanced UI controls, and numerous 3D design refinements. This is the most feature-complete TARS-AI version to date, blending faster neural processing with smarter integrations and a fully reworked mechanical design.

🔥 Main Highlights
TARS V2 Hardware Upgrade
New main leg mounts and mechanical refinements for better stability.
Static TARS case option (no motors) and updated design without heat inserts.
Fixed lid clearance issues with the camera and speaker mounts.
Hybrid RAG & Neural Net Enhancements
Significantly faster Hybrid RAG response times.
Integrated neural net support for smarter motion control and decision-making.
UI & Web Interface Upgrades
New motion control tab in ChatUI for seamless real-time control.
Refined UI font sizes, fullscreen toggle, and better overall visual polish.
Battery & Power Monitoring
INA260-based battery support for real-time voltage and current tracking.
FPS variable for smoother display and control updates.

🔥 New Features & Upgrades
Azure Voice Modifications: Enhanced voice synthesis and flexibility.
OpenAI TTS Option: Additional TTS engine choice for customizable voices.
Config.ini Improvements: Toggle camera directly from configuration settings.
3D CAD Files: Added TARS CAD STEP file and corrections to printed parts.

⚙️ Technical & Performance Improvements
Fixed 90° and 270° rotation issues with the new layout logic.
Optimized engine and BT controller modules for smoother performance.
Numerous 3D print corrections and incremental V2 updates for reliability.

🔄 What’s Changed
Added image to README by @mossed1
Merged development into main branch by @pyrater
Improved Hybrid RAG response and Picovoice support by @latishab
UI improvements and motion control tab by @pyrater & @atomikspace
Font size fixes, fullscreen toggle, and camera config toggle by @atomikspace
Battery monitoring via INA260 by @atomikspace
TARS V2 hardware updates and multiple 3D print corrections by @atomikspace
Azure voice modifications and OpenAI TTS option by @atomikspace
Fixed JSON output format for movement LLM calls by @Autosuffisant
Fixed lid clearance issues with camera and speaker mount by @mossed1
Updated engine and BT controller modules by @mossed1

🎉 New Contributors
@mossed1 made their first contribution
@Autosuffisant made their first contribution
Special thanks to @atomikspace as he as done an extreme amount of heavy lifting (most of the changes) to get the V2 rework out the door.

📜 Full Changelog: 3.0 → 4.0
👉 View All Commits

Contributors

pyrater, Autosuffisant, and 3 other contributors

Assets 2

18 Feb 02:57

pyrater

3.0

9dcf944

3.0

🚀 RELEASE v3.0: Faster, Smarter, and Now With a Fancy WebUI!
The TARS-AI 3.0 release brings a massive overhaul with improved speech capabilities, a sleek new chat WebUI, better memory handling, and countless refinements. This is the most powerful and user-configurable version of TARS-AI yet.

🔥 Main Highlights
Text-to-Speech (TTS) & Speech-to-Text (STT) Overhaul
Refactored TTS and STT for better efficiency and flexibility.
Added Eleven Labs TTS for high-quality voice synthesis.
Integrated Silero VAD (Voice Activity Detection) for smarter audio processing.
DEEPINFRA LLM option added.
TTS now streams per sentence in chunks, making responses significantly faster.

🔥 New Features & Upgrades
Chat WebUI: Fully interactive web-based chat interface for easier communication.
Expanded User Configuration: More settings added to config.ini for deeper customization.
Message Queue System: Terminal now processes messages more efficiently with queuing.
Memory / RAG Enhancements: Upgraded retrieval-augmented generation (RAG) from Latisha, making long-term recall more accurate and fluid.
Wiki Created: Comprehensive documentation for setup, features, and customization.

🔥 File & Structure Improvements
Reorganized and updated 3D files for better organization.
Updated Character Card: More options for defining character traits and personalities.

⚙️ Technical & Performance Improvements
Faster Response Time: TTS chunked streaming greatly reduces lag.
Bug Fixes Galore: Squashed countless bugs for a smoother experience.
Optimized Memory Handling: Smarter memory system improves both speed and accuracy.
Refactored Codebase: Cleaner, more maintainable structure across core modules.

🔄 What's Changed
Added Eleven Labs TTS by @latishab
Added DEEPINFRA LLM by @latishab
Integrated Silero VAD by @latishab
Implemented Chat WebUI by @pyrater
Refactored TTS & STT pipeline by @pyrater
Improved RAG Memory from Latisha by @latishab
Message Queue System for Terminal by @pyrater
Updated and reorganized 3D files by @pyrater
Created Wiki for Documentation by @latishab
Readme Updates by @mossed1

🎉 New Contributors
@pyrater made their first contribution
@xanomanox made their first contribution
@DaftNinja made their first contribution
@mspierg made their first contribution
@holmesha made their first contribution
@BiosUp made their first contribution
@latishab many updates merged by pyrater.

📜 Full Changelog: 2.0 → 3.0
👉 View All Commits

Get ready for the fastest, smartest, and most customizable version of TARS-AI yet! 🚀

Contributors

pyrater, latishab, and 5 other contributors

Assets 2

29 Jan 01:47

pyrater

2.0

7be6df8

RELEASE v2.0: Now with 100% more sarcasm and 0% chances of docking mishaps!

TARS-AI 2.0 Development Branch Update

The TARS-AI 2.0 development branch introduces several enhancements over the initial 1.0 release, including:

🚀 Core Additions

Speech-to-Text (STT) Enhancements:

Integrated Whisper for local STT alongside Vosk and external server options.
Added STT sensitivity configuration for improved accuracy.
Comprehensive improvements to the STT system for better transcription.
Significant speed improvements.
Dynamic microphone quality selection based on hardware.

Character Customizations:

Persistent personality settings (e.g., humor and tone) for dynamic character interaction.
Expanded character settings and behavior customization options.

Function Calling:

Dual methods for function calls: Naive Bayes (NB) and LLM-based approaches.

Voice-Controlled Movement:

Enabled voice commands to control robotic movements with precise mapping.

Image Generation:

Integrated DALL·E and Stable Diffusion for AI-powered image creation.

Volume Control:

Fine-tuned volume adjustments through both voice commands and external configurations. Credit: @MSKULL

Home Assistant Integration:

Seamless connection with smart home systems.

⚙️ Technical Improvements

Reworked LLM function into its own module.
Reworked LLM function to be compatible with thinking models (Deepseek R1).
Reworked build prompt function for easy importing.
Reworked memory module to ensure correct prompt and memory management.
Reworked tokenization for proper counts.
Renamed STL files for better readability.
Many fixes and tweaks for better usability.
Over 196 commits.

Override Encoding Model:

Enhanced compatibility with models using override_encoding_model.

TTS Fixes:

Resolved issues with special characters in Text-to-Speech.

What's Changed

Update config.ini.template by @DaftNinja in https://github.com/pyrater/TARS-AI/pull/3
V9-misc-mods by @xanomanox in https://github.com/pyrater/TARS-AI/pull/4
add M3 grub screws to fasteners by @xanomanox in https://github.com/pyrater/TARS-AI/pull/5
updated parts list (in progress) by @xanomanox in https://github.com/pyrater/TARS-AI/pull/6
Feature/volume by @mspierg in https://github.com/pyrater/TARS-AI/pull/8
Update training_data.csv by @holmesha in https://github.com/pyrater/TARS-AI/pull/10
Dev by @holmesha in https://github.com/pyrater/TARS-AI/pull/11
added openai to requirements.txt by @BiosUp in https://github.com/pyrater/TARS-AI/pull/12

New Contributors

@DaftNinja made their first contribution in https://github.com/pyrater/TARS-AI/pull/3
@xanomanox made their first contribution in https://github.com/pyrater/TARS-AI/pull/4
@mspierg made their first contribution in https://github.com/pyrater/TARS-AI/pull/8
@holmesha made their first contribution in https://github.com/pyrater/TARS-AI/pull/10
@BiosUp made their first contribution in https://github.com/pyrater/TARS-AI/pull/12

Full Changelog: pyrater/TARS-AI@1.0...2.0

Contributors

mspierg, URL42, and 3 other contributors

Assets 2

30 Dec 18:28

pyrater

1.0

6123fcf

RELEASE v1.0: Hello World, Hello TARS!

Version: TARS-AI 1.0

We are thrilled to announce the official release of TARS-AI 1.0, the first version of our highly anticipated intelligent assistant designed for local deployment, inspired by the iconic TARS robot from Interstellar. This marks a significant step toward integrating advanced AI capabilities into personalized, everyday tasks while maintaining privacy and security.

Key Features
Fully Localized AI

Runs entirely on local hardware, ensuring data privacy and independence from cloud services.
TARS Personality Integration

TARS-AI embodies the humor, utility, and wit of the original TARS character, making interactions not just efficient but also engaging.
Modular Design

Supports a variety of modules, including natural language understanding, task automation, and information retrieval, all customizable to your specific needs.
Real-Time Interaction

Provides dynamic, context-aware responses, allowing seamless conversational experiences.
Expandable Hardware Integration

Easily integrates with external sensors, actuators, and robotics systems, opening doors to home automation, robotics, and more.
Configurable AI Ethics and Humor Levels

Adjust personality traits like humor and honesty to tailor TARS-AI to your preferences.
Developer-Friendly Framework

Built with an open API and extensive documentation, enabling developers to add custom functionalities effortlessly.

What's Changed

RELEASE v1.0: Hello World, Hello TARS! by @alexander-wang03 in pyrater#1
ENVSETUP.md Azure TTS Instructions Update by @alexander-wang03 in https://github.com/pyrater/TARS-AI/pull/2

New Contributors

@alexander-wang03 made their first contribution in pyrater#1

Full Changelog: https://github.com/pyrater/TARS-AI/commits/1.0

Contributors

alexander-wang03

Assets 2

Releases: TARS-AI-Community/TARS-AI

OS Amelia

OS Version Amelia — Major Milestone

New Features

Skills Plugin System

Streaming LLM + Sentence TTS Pipeline

Speaker ID Rewrite

Dashboard Tab

Heartbeat System & Reminder Skill

Identity Coordinator (module_identity.py)

Atomik Wake Word — CNN/ONNX + Multi-Gate Pipeline

Wake Word Gates

New Infrastructure Modules

Raspberry Pi Zero 2 Support

New Skills

Modules Inlined or Removed

Web UI: Browser Voice Mode

Web UI: Themes

Web UI: Login Authentication

Web UI: New Tabs and Controls

Uh oh!

4.0

Contributors

Uh oh!

3.0

Contributors

Uh oh!

RELEASE v2.0: Now with 100% more sarcasm and 0% chances of docking mishaps!

TARS-AI 2.0 Development Branch Update

🚀 Core Additions

⚙️ Technical Improvements

What's Changed

New Contributors

Contributors

Uh oh!

RELEASE v1.0: Hello World, Hello TARS!

What's Changed

New Contributors

Contributors

Uh oh!

Identity Coordinator (`module_identity.py`)