Existing drone platforms monitor missions. Our platform learns from missions.
Every failure, mitigation, outcome, agent decision, and evaluation becomes operational knowledge that improves future mission recommendations.
TARS is a runtime feedback system for autonomous drones. It lets drone agents continuously trace, evaluate, and introspect their own behavior — detecting telemetry anomalies, decision inconsistencies, and mission failures to iteratively refine future actions.
Built with PX4, Gazebo, MAVSDK, Python, Gemini, Phoenix, Neo4j, and Redis.
TARS is organized as a layered pipeline. Each layer builds on the one below it:
| Layer | What It Does | Port |
|---|---|---|
| Phase 1 — Mission Foundation | Runs PX4 SITL + Gazebo headless simulations, collects async telemetry via MAVSDK, injects faults (GPS block, battery drain, sensor cascade, etc.), and writes structured JSON output. | — |
| Phase 2 — Mission Replay | Imports mission JSON into PostgreSQL, provides ordered replay frames with elapsed timing, and exposes a FastAPI REST API for mission queries. | 8000 |
| Phase 3 — State Engine | Transforms replay frames into classified mission states (phase, health, risk score) and stores them as Redis timelines. Deterministic phase classification and additive risk scoring. | 8002 |
| Phase 4 — Incident Engine | Evaluates state timelines against 7 rule types across 4 severity levels. Collapses consecutive matches into bounded incidents with gap-based merging and persistence thresholds. | 8003 |
| Phase 5 — Gemini Reasoning | Analyzes bounded incidents using Google Gemini (via ADK) to produce structured, advisory-only root-cause assessments. Provider-neutral interface with versioned prompts and control-command rejection at the model boundary. | 8004 |
| Phase 6 — Phoenix Integration | Instruments the reasoning layer with OpenTelemetry tracing and exports spans to Arize Phoenix. Produces parent-child span hierarchies with OpenInference semantic conventions and configurable content capture (full / metadata / disabled). | — |
| Phase 7 — Operational Memory | Projects bounded facts from Phases 2, 4, and 5 into a Neo4j graph. Connects missions → incidents → root causes → mitigations → outcomes. Answers "Have we seen this before?" with provenance-preserving history queries. | 8005 |
| Phase 8 — Phoenix MCP | Analysis-only self-introspection via Phoenix traces. Lets the reasoning agent inspect its own prior reasoning through 3 read-only MCP tools (search, summary, compare). Fail-open design: Phoenix unavailability never blocks reasoning. 4 content modes, secret redaction, not_an_evaluation enforcement. |
— |
| Phase 9 — Evaluation Layer | Measures reasoning quality against bounded ground-truth labels, mission outcomes, and incident facts. Produces durable, inspectable evaluation scores (root-cause accuracy, recommendation quality, consistency, false positives/negatives) without changing operational behavior. | 8006 |
| Phase 10 — Learning Engine | Turns evaluated mission history into candidate operational knowledge. Aggregates Phase 9 evaluations, Phase 7 operational memory, and safe trace metadata into candidate knowledge with evidence, confidence, and provenance. Candidate knowledge is not truth — it is input to Phase 11 validation. | 8007 |
Each phase has its own FastAPI service, test suite, and configuration. Phases communicate over HTTP — no shared databases, no tight coupling.
- Docker 24+ with Docker Compose v2
- Python 3.10+
- ~10GB disk space for PX4 Docker image (one-time download)
- Optional: QGroundControl for visual drone tracking
cd ~/Desktop/Projects/TARS
# Copy environment config
cp .env.example .env
# Create a Python virtual environment and install dependencies
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt# First run builds the Docker image (~15-30 min, downloads PX4 source + compiles)
./scripts/start_simulation.sh
# Wait until you see "Ready for takeoff" in the logsNote: The first build takes a while because it clones and compiles the entire PX4 firmware. Subsequent starts are fast (~10 seconds).
# Activate the venv first
. .venv/bin/activate
# Run the default square mission with telemetry collection
./scripts/run_mission.sh
# Or with a custom mission ID
MISSION_ID=my_first_mission ./scripts/run_mission.sh
# Run a mission with a fault scenario (faults recorded in output JSON)
FAULT_SCENARIO=s1 ./scripts/run_mission.sh# Telemetry is saved as JSON in output/
cat output/mission_*.json | python3 -m json.tool | head -50# Activate the venv first
. .venv/bin/activate
# Interactive fault injection while a mission is running
PYTHONPATH=src .venv/bin/python3 -m tars.phase1.fault_injector./scripts/start_simulation.sh --stopTARS/
|-- plans/ # Architecture and planning docs
| |-- phase-1-mission-foundation.md
| |-- phase-2-mission-replay-system.md
| |-- phase-3-state-engine.md
| |-- phase-4-incident-engine.md
| |-- phase-5-gemini-reasoning-layer.md
| |-- phase-6-phoenix-integration.md
| |-- phase-7-neo4j-operational-memory.md
| +-- phase-8-phoenix-mcp.md
|-- docker/ # Docker setup
| |-- Dockerfile.px4-sitl # PX4 SITL + Gazebo headless
| +-- docker-compose.yml # PX4 SITL + PostgreSQL + Redis + Neo4j
|-- src/
| +-- tars/
| |-- phase1/ # Phase 1 -- Mission Foundation
| | |-- telemetry_collector.py # Async telemetry streaming via MAVSDK
| | |-- mission_runner.py # Autonomous mission execution
| | |-- fault_injector.py # Fault injection + scenarios
| | +-- models/
| | +-- telemetry.py # Pydantic data models
| |-- phase2/ # Phase 2 -- Mission Replay System
| | |-- api.py # FastAPI app and routes
| | |-- config.py # Environment settings
| | |-- database.py # Async SQLAlchemy engine/session
| | |-- importer.py # Phase 1 JSON import + validation
| | |-- replay.py # Replay frame construction
| | |-- service.py # Mission query orchestration
| | +-- models/
| | |-- db.py # SQLAlchemy ORM tables
| | +-- schemas.py # API request/response schemas
| |-- phase3/ # Phase 3 -- State Engine
| | |-- api.py # FastAPI app (port 8002)
| | |-- config.py # Environment settings
| | |-- models.py # Pydantic models and enums
| | |-- phase_classifier.py # Deterministic phase rules
| | |-- risk.py # Risk scoring and health assessment
| | |-- state_processor.py # Frame-to-state transformation
| | |-- store.py # Async Redis state store
| | |-- replay_client.py # HTTP client for Phase 2 API
| | +-- service.py # Processing orchestration
| |-- phase4/ # Phase 4 -- Incident Engine
| | |-- api.py # FastAPI app (port 8003)
| | |-- config.py # Environment settings
| | |-- models.py # Incident enums and schemas
| | |-- rules.py # Deterministic state rules
| | |-- statistics.py # Rolling windows and trend detection
| | |-- detector.py # Incident collapser
| | |-- store.py # Async Redis incident store
| | |-- state_client.py # HTTP client for Phase 3 API
| | +-- service.py # Detection orchestration
| |-- phase5/ # Phase 5 -- Gemini Reasoning Layer
| | |-- api.py # FastAPI app (port 8004)
| | |-- config.py # Environment settings
| | |-- models.py # Reasoning schemas and provider protocol
| | |-- prompts.py # Versioned system instruction and prompt
| | |-- agent.py # Google ADK Gemini agent configuration
| | |-- provider.py # Gemini + fake reasoning providers
| | |-- incident_client.py # HTTP client for Phase 4 API
| | |-- store.py # Async Redis reasoning store
| | +-- service.py # Reasoning orchestration
| |-- phase6/ # Phase 6 -- Phoenix Integration
| | |-- config.py # PhoenixSettings (env-driven)
| | |-- attributes.py # Stable trace attribute constants
| | +-- tracing.py # TracerProvider setup, OTLP exporter
| |-- phase7/ # Phase 7 -- Operational Memory
| |-- api.py # FastAPI app (port 8005)
| |-- config.py # Environment settings
| |-- models.py # Graph models, enums, request/response
| |-- database.py # Async Neo4j driver lifecycle
| |-- schema.py # Constraints and indexes
| |-- mapper.py # Pure mapping + deterministic IDs
| |-- repository.py # Graph MERGE/MATCH operations
| |-- service.py # Sync + query orchestration
| |-- phase2_client.py # HTTP client for Phase 2 API
| |-- phase4_client.py # HTTP client for Phase 4 API
| |-- phase5_client.py # HTTP client for Phase 5 API
| +-- phase8/ # Phase 8 -- Phoenix MCP Self-Introspection
| |-- config.py # PhoenixMCPSettings (env-driven, disabled by default)
| |-- models.py # Pydantic models, secret redaction, safety bounds
| |-- phoenix_client.py # GraphQL client + FakePhoenixTraceClient
| |-- summarizer.py # Raw trace → safe summary conversion
| |-- mcp_tools.py # 3 read-only MCP tool definitions
| |-- tool_policy.py # IntrospectionPolicy decision engine
| +-- service.py # IntrospectionService orchestration
| +-- phase9/ # Phase 9 -- Evaluation Layer
| |-- api.py # FastAPI app (port 8006)
| |-- config.py # Environment settings + weight validation
| |-- models.py # Evaluation schemas, enums, metric contracts
| |-- database.py # Async SQLAlchemy engine/session
| |-- repository.py # PostgreSQL evaluation persistence
| |-- evaluator.py # Deterministic scoring (root cause, recommendation, consistency)
| |-- ground_truth.py # Multi-source ground-truth resolution
| |-- service.py # Evaluation orchestration
| |-- phoenix_exporter.py # Optional Phoenix eval export
| +-- adapters/
| |-- phase4_client.py # Read-only Phase 4 incident client
| |-- phase5_client.py # Read-only Phase 5 reasoning client
| +-- phase7_client.py # Read-only Phase 7 outcome client
| +-- phase10/ # Phase 10 -- Learning Engine
| |-- api.py # FastAPI app (port 8007)
| |-- config.py # Environment settings + weight validation
| |-- models.py # Learning schemas, enums, candidate contracts
| |-- database.py # Async SQLAlchemy engine/session
| |-- repository.py # PostgreSQL candidate knowledge persistence
| |-- service.py # Learning run orchestration
| |-- evidence_loader.py # Phase 9 + Phase 7 evidence merge
| |-- pattern_miner.py # Deterministic pattern grouping
| |-- scorer.py # Versioned confidence scoring
| |-- statement_templates.py # Cautious association language templates
| +-- adapters/
| |-- phase9_client.py # Read-only Phase 9 evaluation client
| |-- phase7_client.py # Read-only Phase 7 memory client
| +-- phoenix_client.py # Read-only Phoenix trace metadata client
|-- migrations/ # Alembic database migrations
| |-- env.py
| +-- versions/
|-- scripts/
| |-- start_simulation.sh # Launch/stop PX4 simulation
| |-- run_mission.sh # Run a Phase 1 mission
| |-- start_replay_api.sh # Start Phase 2 API server
| |-- import_mission.sh # Import mission JSON via API
| |-- start_state_api.sh # Start Phase 3 State API server
| |-- process_mission_state.sh # Process a mission through Phase 3
| |-- start_incident_api.sh # Start Phase 4 Incident API server
| |-- process_mission_incidents.sh # Detect incidents for a mission
| |-- start_reasoning_api.sh # Start Phase 5 Reasoning API server
| |-- analyze_incident.sh # Analyze an incident through Phase 5
| |-- start_memory_api.sh # Start Phase 7 Memory API server
| |-- sync_mission_memory.sh # Sync a mission into Neo4j graph
| |-- query_similar_incidents.sh # Query similar historical incidents
| |-- start_evaluation_api.sh # Start Phase 9 Evaluation API server
| +-- start_learning_api.sh # Start Phase 10 Learning API server
|-- tests/
| |-- phase2/ # Phase 2 tests
| | |-- test_importer.py
| | |-- test_replay.py
| | +-- test_api.py
| |-- phase3/ # Phase 3 tests
| | |-- test_phase_classifier.py
| | |-- test_risk.py
| | |-- test_state_processor.py
| | |-- test_store.py
| | +-- test_api.py
| |-- phase4/ # Phase 4 tests
| | |-- test_rules.py
| | |-- test_statistics.py
| | |-- test_detector.py
| | |-- test_store.py
| | +-- test_api.py
| |-- phase5/ # Phase 5 tests
| | |-- test_models.py
| | |-- test_prompts.py
| | |-- test_client.py
| | |-- test_provider.py
| | |-- test_store.py
| | |-- test_service.py
| | +-- test_api.py
| |-- phase6/ # Phase 6 tests
| | |-- test_config.py # 31 configuration tests
| | |-- test_tracing.py # 14 tracing bootstrap tests
| | +-- test_reasoning_traces.py # 44 reasoning trace tests
| |-- phase7/ # Phase 7 tests
| | |-- test_models.py # Model validation tests
| | |-- test_mapper.py # Mapping + deterministic ID tests
| | |-- test_clients.py # Upstream HTTP client tests
| | |-- test_repository.py # Graph operation tests
| | |-- test_service.py # Service orchestration tests
| | +-- test_api.py # API endpoint tests
| |-- phase8/ # Phase 8 tests (149 tests)
| | |-- test_config.py # 18 configuration tests
| | |-- test_models.py # 30 model validation tests
| | |-- test_summarizer.py # 17 summarization tests
| | |-- test_phoenix_client.py # 18 fake client tests
| | |-- test_mcp_tools.py # 12 MCP tool tests
| | |-- test_tool_policy.py # 9 policy decision tests
| | +-- test_reasoning_integration.py # 45 integration tests
| |-- phase9/ # Phase 9 tests
| | |-- test_config.py # Configuration validation tests
| | |-- test_models.py # Model validation tests
| | |-- test_evaluator.py # Deterministic scoring tests
| | |-- test_ground_truth.py # Ground-truth resolution tests
| | |-- test_repository.py # Persistence tests
| | |-- test_service.py # Service orchestration tests
| | |-- test_phoenix_exporter.py # Phoenix export tests
| | +-- test_api.py # API endpoint tests
| +-- phase10/ # Phase 10 tests
| |-- test_config.py # Configuration validation tests
| |-- test_models.py # Model + enum validation tests
| |-- test_evidence_loader.py # Evidence merge + dedup tests
| |-- test_pattern_miner.py # Deterministic pattern grouping tests
| |-- test_scorer.py # Confidence scoring tests
| |-- test_repository.py # Persistence tests
| |-- test_service.py # Service orchestration tests
| +-- test_api.py # API endpoint tests
|-- output/ # Telemetry JSON files
|-- alembic.ini # Alembic configuration
|-- pytest.ini # Pytest configuration
|-- requirements.txt # Python dependencies
|-- .env.example # Configuration template
+-- README.md
Phase 2 runs independently of PX4/Gazebo. You only need PostgreSQL and the Phase 2 API.
docker compose -f docker/docker-compose.yml up postgres -dPYTHONPATH=src .venv/bin/alembic upgrade head./scripts/start_replay_api.sh
# API docs available at http://localhost:8000/docs
# Health check at http://localhost:8000/health# Via the script (requires API running)
./scripts/import_mission.sh output/mission_20260608_120000.json
# Or via curl
curl -X POST http://localhost:8000/api/v1/missions/import \
-H "Content-Type: application/json" \
-d '{"path": "output/mission_20260608_120000.json", "overwrite": false}'# List all missions
curl http://localhost:8000/api/v1/missions
# Get mission detail (includes faults)
curl http://localhost:8000/api/v1/missions/mission_20260608_120000
# Get telemetry events
curl http://localhost:8000/api/v1/missions/mission_20260608_120000/events
# Replay a mission
curl http://localhost:8000/api/v1/missions/mission_20260608_120000/replay
# Replay with time range
curl "http://localhost:8000/api/v1/missions/mission_20260608_120000/replay?from_ms=5000&to_ms=30000"# Requires PostgreSQL running
PYTHONPATH=src .venv/bin/pytest tests/phase2/ -vPhase 3 runs independently of PX4/Gazebo. You need Redis, the Phase 2 API (for replay data), and the Phase 3 State API.
docker compose -f docker/docker-compose.yml up redis -dPhase 3 fetches replay frames from Phase 2, so the Phase 2 API must be running:
# Start PostgreSQL + run migrations if not already done
docker compose -f docker/docker-compose.yml up postgres -d
PYTHONPATH=src .venv/bin/alembic upgrade head
./scripts/start_replay_api.sh./scripts/start_state_api.sh
# API docs available at http://localhost:8002/docs
# Health check at http://localhost:8002/health# Via the script (requires both APIs running)
./scripts/process_mission_state.sh mission_20260608_120000
# Or via curl
curl -X POST http://localhost:8002/api/v1/state/process/mission_20260608_120000 \
-H "Content-Type: application/json" \
-d '{}'
# Process with time range (partial replay -- does not update current state)
curl -X POST http://localhost:8002/api/v1/state/process/mission_20260608_120000 \
-H "Content-Type: application/json" \
-d '{"from_ms": 5000, "to_ms": 30000}'# Get current state snapshot
curl http://localhost:8002/api/v1/state/mission_20260608_120000/current
# Get full state timeline
curl http://localhost:8002/api/v1/state/mission_20260608_120000/timeline
# Get timeline for a time range
curl "http://localhost:8002/api/v1/state/mission_20260608_120000/timeline?from_ms=5000&to_ms=30000"
# Get state at a specific time
curl http://localhost:8002/api/v1/state/mission_20260608_120000/at/15000
# Get processing status
curl http://localhost:8002/api/v1/state/mission_20260608_120000/status# Pure logic tests (no Redis required)
PYTHONPATH=src .venv/bin/pytest tests/phase3/test_phase_classifier.py tests/phase3/test_risk.py tests/phase3/test_state_processor.py -v
# All tests including Redis integration (requires Redis running)
PYTHONPATH=src .venv/bin/pytest tests/phase3/ -vPhase 4 runs independently of PX4/Gazebo. You need Redis, the Phase 3 State API (for state timelines), and the Phase 4 Incident API.
# Start Redis
docker compose -f docker/docker-compose.yml up redis -d
# Start Phase 2 + Phase 3 APIs (Phase 4 depends on Phase 3 timelines)
./scripts/start_replay_api.sh &
./scripts/start_state_api.sh &./scripts/start_incident_api.sh
# API docs available at http://localhost:8003/docs
# Health check at http://localhost:8003/health# Via the script (requires Phase 3 + Phase 4 APIs running)
./scripts/process_mission_incidents.sh mission_20260608_120000
# Or via curl
curl -X POST http://localhost:8003/api/v1/incidents/process/mission_20260608_120000 \
-H "Content-Type: application/json" \
-d '{}'# List all incidents for a mission
curl http://localhost:8003/api/v1/incidents/mission_20260608_120000
# List incidents within a time range
curl "http://localhost:8003/api/v1/incidents/mission_20260608_120000?from_ms=5000&to_ms=30000"
# Get a specific incident by ID
curl http://localhost:8003/api/v1/incidents/mission_20260608_120000/inc_abc123
# Get processing status
curl http://localhost:8003/api/v1/incidents/mission_20260608_120000/status# Pure logic tests (no Redis required)
PYTHONPATH=src .venv/bin/pytest tests/phase4/test_rules.py tests/phase4/test_statistics.py tests/phase4/test_detector.py -v
# All tests including Redis integration (requires Redis running)
PYTHONPATH=src .venv/bin/pytest tests/phase4/ -vPhase 5 runs independently of PX4/Gazebo. You need Redis, the Phase 4 Incident API (for incident data), and the Phase 5 Reasoning API.
# Start Redis
docker compose -f docker/docker-compose.yml up redis -d
# Start Phase 2 + Phase 3 + Phase 4 APIs
./scripts/start_replay_api.sh &
./scripts/start_state_api.sh &
./scripts/start_incident_api.sh &# Set your Gemini API key in .env
echo "GEMINI_API_KEY=your-key-here" >> .env
# Or export directly
export GEMINI_API_KEY=your-key-hereNote: The API starts without a Gemini key but analysis endpoints will return configuration errors. Health endpoint reports
gemini: unconfigured.
./scripts/start_reasoning_api.sh
# API docs available at http://localhost:8004/docs
# Health check at http://localhost:8004/health# Via the script (requires Phase 4 + Phase 5 APIs running)
./scripts/analyze_incident.sh mission_20260608_120000 inc_abc123
# Or via curl
curl -X POST http://localhost:8004/api/v1/reasoning/analyze/mission_20260608_120000/inc_abc123 \
-H "Content-Type: application/json" \
-d '{"overwrite": true}'# Get analysis for a specific incident
curl http://localhost:8004/api/v1/reasoning/mission_20260608_120000/inc_abc123
# List all analyses for a mission
curl http://localhost:8004/api/v1/reasoning/mission_20260608_120000
# Reuse existing analysis (no Gemini call)
curl -X POST http://localhost:8004/api/v1/reasoning/analyze/mission_20260608_120000/inc_abc123 \
-H "Content-Type: application/json" \
-d '{"overwrite": false}'# Pure logic tests (no Redis or Gemini required)
PYTHONPATH=src .venv/bin/pytest tests/phase5/test_models.py tests/phase5/test_prompts.py tests/phase5/test_provider.py tests/phase5/test_client.py -v
# All tests including Redis integration (requires Redis running)
PYTHONPATH=src .venv/bin/pytest tests/phase5/ -vPhase 6 instruments the Phase 5 reasoning pipeline with OpenTelemetry tracing and exports spans to Arize Phoenix. No separate API — it hooks into Phase 5's service layer.
Set the following in .env:
PHOENIX_ENABLED=true
PHOENIX_ENDPOINT=http://localhost:6006
PHOENIX_PROJECT_NAME=tars-reasoning
PHOENIX_CONTENT_MODE=full # full | metadata | disabled# All Phase 6 tests (no Phoenix required -- tracing is mocked)
PYTHONPATH=src .venv/bin/pytest tests/phase6/ -vPhase 7 projects bounded facts from Phases 2, 4, and 5 into a Neo4j graph database. It connects missions → incidents → root causes → mitigations → outcomes and answers "Have we seen this before?" with provenance-preserving history queries.
# Start Neo4j (+ PostgreSQL and Redis for upstream phases)
docker compose -f docker/docker-compose.yml up neo4j postgres redis -d
# Start Phase 2 + Phase 3 + Phase 4 + Phase 5 APIs
./scripts/start_replay_api.sh &
./scripts/start_state_api.sh &
./scripts/start_incident_api.sh &
./scripts/start_reasoning_api.sh &# Set your Neo4j password in .env (must match docker-compose)
echo "NEO4J_PASSWORD=tars" >> .env
# Or export directly
export NEO4J_PASSWORD=tarsNote: The default Docker Compose configuration sets the Neo4j password to
tars. Schema constraints and indexes are created automatically on API startup.
./scripts/start_memory_api.sh
# API docs available at http://localhost:8005/docs
# Health check at http://localhost:8005/health# Via the script (requires upstream APIs running)
./scripts/sync_mission_memory.sh mission_20260608_120000
# Or via curl
curl -X POST http://localhost:8005/api/v1/memory/sync \
-H "Content-Type: application/json" \
-d '{"mission_id": "mission_20260608_120000"}'
# Check sync status
curl http://localhost:8005/api/v1/memory/sync/mission_20260608_120000# Get incident neighborhood (root causes, mitigations, outcomes)
curl http://localhost:8005/api/v1/memory/incidents/inc_abc123
# Find similar historical incidents
curl "http://localhost:8005/api/v1/memory/incidents/inc_abc123/similar?limit=10"
# Or via the script
./scripts/query_similar_incidents.sh inc_abc123# Record an applied mitigation
curl -X POST http://localhost:8005/api/v1/memory/mitigations \
-H "Content-Type: application/json" \
-d '{
"incident_id": "inc_abc123",
"mitigation_text": "Switched to backup GPS receiver",
"applied_by": "operator"
}'
# Record an outcome
curl -X POST http://localhost:8005/api/v1/memory/outcomes \
-H "Content-Type: application/json" \
-d '{
"incident_id": "inc_abc123",
"status": "recovered",
"description": "GPS signal restored after switching to backup receiver",
"mitigation_application_id": "ma_xyz789"
}'# All Phase 7 tests (no Neo4j required -- all graph operations are mocked)
PYTHONPATH=src .venv/bin/pytest tests/phase7/ -v
# Individual test modules
PYTHONPATH=src .venv/bin/pytest tests/phase7/test_models.py tests/phase7/test_mapper.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase7/test_clients.py tests/phase7/test_repository.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase7/test_service.py tests/phase7/test_api.py -vPhase 8 adds analysis-only trace introspection to the reasoning pipeline. The reasoning agent can inspect its own prior reasoning traces through Phoenix, using 3 read-only MCP tools. This is not an evaluation layer — all outputs carry not_an_evaluation=True and explicit limitation warnings.
- Read-only: No trace creation, modification, or evaluation scores
- Fail-open: Phoenix unavailability never blocks reasoning — returns empty context
- Bounded: Max 10 traces per query, 2000-char summaries, 5s query timeout
- Safe: Secret redaction on all summaries, no raw telemetry exposure
- Opt-in: Disabled by default, requires both config flag and per-request flag
# Enable in .env
PHOENIX_MCP_ENABLED=true
PHOENIX_MCP_CONTENT_MODE=summary # metadata | summary | full_dev | disabled
PHOENIX_MCP_GRAPHQL_ENDPOINT=http://localhost:6006/graphql
PHOENIX_MCP_MAX_TRACES=10
PHOENIX_MCP_MAX_SUMMARY_CHARS=2000
PHOENIX_MCP_QUERY_TIMEOUT_S=5.0
PHOENIX_MCP_REDACT_SECRETS=true
PHOENIX_MCP_REQUIRE_REQUEST_FLAG=true # Require use_introspection=true per request| Tool | Description |
|---|---|
search_traces |
Find prior reasoning traces by mission_id, incident_type, root_cause, outcome, time range |
get_trace_summary |
Get a safe summary of a specific trace (redacted, truncated, no raw content) |
compare_traces |
Compare 2–5 traces, producing descriptive observations (never evaluative scores) |
# Analyze with introspection enabled (requires PHOENIX_MCP_ENABLED=true)
curl -X POST http://localhost:8004/api/v1/reasoning/analyze/mission_001/inc_abc123 \
-H "Content-Type: application/json" \
-d '{"use_introspection": true}'The response includes introspection metadata when traces are found:
{
"introspection_used": true,
"introspection_trace_ids": ["trace_abc", "trace_def"],
"introspection_summary": "Consulted 2 prior traces for similar incidents"
}| Mode | What's Included |
|---|---|
disabled |
Introspection completely off |
metadata |
Trace IDs, timestamps, incident types — no reasoning content |
summary |
Metadata + redacted summaries and stage info |
full_dev |
Everything including raw content (development only) |
# Phoenix MCP status is included in the Phase 5 health endpoint
curl http://localhost:8004/health
# Response includes: "phoenix_mcp": "ok" | "disabled" | "unavailable"# All Phase 8 tests (no Phoenix required -- uses FakePhoenixTraceClient)
PYTHONPATH=src .venv/bin/pytest tests/phase8/ -v
# Individual test modules
PYTHONPATH=src .venv/bin/pytest tests/phase8/test_config.py tests/phase8/test_models.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase8/test_summarizer.py tests/phase8/test_phoenix_client.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase8/test_mcp_tools.py tests/phase8/test_tool_policy.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase8/test_reasoning_integration.py -vPhase 9 measures the quality of reasoning outputs against bounded ground-truth labels, mission outcomes, and incident facts. It produces durable, inspectable evaluation scores without changing operational behavior.
- Analysis-only: Never calls flight-control APIs, invokes Gemini, or mutates upstream records
- Bounded metrics: All scores are [0.0, 1.0], all results carry
advisory_only=True - Explicit evidence: Missing ground truth produces
insufficient_evidence, not invented scores - Fail-open: Phoenix and Phase 7 are optional; their unavailability does not fail evaluation
- Deterministic: All scoring uses versioned aliases and families, no LLM calls during evaluation
docker compose -f docker/docker-compose.yml up postgres -d
PYTHONPATH=src .venv/bin/alembic upgrade head./scripts/start_evaluation_api.sh
# API docs available at http://localhost:8006/docs
# Health check at http://localhost:8006/healthcurl -X POST http://localhost:8006/api/v1/evaluations/labels \
-H "Content-Type: application/json" \
-d '{
"mission_id": "mission_20260618_120000",
"incident_id": "inc_abc123",
"root_cause": "gps_interference",
"preferred_mitigation": "switch_to_visual_odometry",
"outcome": "recovered",
"source": "operator_label",
"labeled_by": "operator"
}'# With inline ground truth
curl -X POST http://localhost:8006/api/v1/evaluations/evaluate \
-H "Content-Type: application/json" \
-d '{
"mission_id": "mission_20260618_120000",
"incident_id": "inc_abc123",
"reasoning_id": "reason_abc123",
"ground_truth": {
"root_cause": "gps_interference",
"preferred_mitigation": "switch_to_visual_odometry",
"outcome": "recovered"
}
}'
# Using stored labels (no inline ground truth needed)
curl -X POST http://localhost:8006/api/v1/evaluations/evaluate \
-H "Content-Type: application/json" \
-d '{
"mission_id": "mission_20260618_120000",
"incident_id": "inc_abc123",
"reasoning_id": "reason_abc123"
}'# Get a specific evaluation
curl http://localhost:8006/api/v1/evaluations/eval_abc123
# Get all evaluations for a mission
curl http://localhost:8006/api/v1/evaluations/mission/mission_20260618_120000
# Get all evaluations for a reasoning result
curl http://localhost:8006/api/v1/evaluations/reasoning/reason_abc123curl -X POST http://localhost:8006/api/v1/evaluations/batch \
-H "Content-Type: application/json" \
-d '{
"targets": [
{"mission_id": "mission_001", "incident_id": "inc_001", "reasoning_id": "reason_001"},
{"mission_id": "mission_001", "incident_id": "inc_002", "reasoning_id": "reason_002"}
]
}'# All Phase 9 tests (no PostgreSQL, Phoenix, Gemini, Neo4j, or PX4 required)
PYTHONPATH=src .venv/bin/pytest tests/phase9/ -v
# Individual test modules
PYTHONPATH=src .venv/bin/pytest tests/phase9/test_config.py tests/phase9/test_models.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase9/test_evaluator.py tests/phase9/test_ground_truth.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase9/test_service.py tests/phase9/test_api.py -vPhase 10 mines Phase 9 evaluation records, Phase 7 operational memory, and safe trace metadata for repeated patterns. It produces candidate knowledge with evidence, confidence, and provenance. Candidate knowledge is NOT truth — it is input to Phase 11 validation.
- Deterministic only — no Gemini, no LLM calls
- advisory_only=True on every candidate, always
- Cautious language — "is associated with", never "causes" or "fixes"
- No validated status — candidates are
proposed,superseded,retired, orrejected - No flight-control impact — candidates never change recommendations
docker compose -f docker/docker-compose.yml up postgres -d
PYTHONPATH=src .venv/bin/alembic upgrade headscripts/start_learning_api.sh
# Or manually:
PYTHONPATH=src .venv/bin/uvicorn tars.phase10.api:app --host 0.0.0.0 --port 8007# Full learning run (mines all candidate types)
curl -X POST http://localhost:8007/api/v1/learning/runs \
-H "Content-Type: application/json" \
-d '{}'
# Dry run (no persistence, returns candidates without saving)
curl -X POST http://localhost:8007/api/v1/learning/runs \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
# Filter by mission
curl -X POST http://localhost:8007/api/v1/learning/runs \
-H "Content-Type: application/json" \
-d '{"mission_ids": ["mission_001", "mission_002"]}'
# Filter by candidate type
curl -X POST http://localhost:8007/api/v1/learning/runs \
-H "Content-Type: application/json" \
-d '{"candidate_types": ["mitigation_effectiveness", "root_cause_pattern"]}'# Get a specific learning run
curl http://localhost:8007/api/v1/learning/runs/{run_id}# List all proposed candidates
curl "http://localhost:8007/api/v1/learning/candidates?status=proposed"
# Filter by type
curl "http://localhost:8007/api/v1/learning/candidates?candidate_type=mitigation_effectiveness"
# Paginate
curl "http://localhost:8007/api/v1/learning/candidates?limit=10&offset=20"
# Get a specific candidate
curl http://localhost:8007/api/v1/learning/candidates/{candidate_id}
# Get evidence for a candidate
curl http://localhost:8007/api/v1/learning/candidates/{candidate_id}/evidencecurl -X POST http://localhost:8007/api/v1/learning/candidates/{candidate_id}/retire \
-H "Content-Type: application/json" \
-d '{"reason": "Superseded by newer analysis"}'# All Phase 10 tests (no PostgreSQL, Phoenix, Gemini, Neo4j, or PX4 required)
PYTHONPATH=src .venv/bin/pytest tests/phase10/ -v
# Individual test modules
PYTHONPATH=src .venv/bin/pytest tests/phase10/test_config.py tests/phase10/test_models.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase10/test_evidence_loader.py tests/phase10/test_pattern_miner.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase10/test_scorer.py tests/phase10/test_repository.py -v
PYTHONPATH=src .venv/bin/pytest tests/phase10/test_service.py tests/phase10/test_api.py -vEach mission produces a JSON file in output/:
{
"mission_id": "mission_001",
"drone_id": "tars-sim-01",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:35:42Z",
"faults_injected": [],
"telemetry": [
{
"timestamp": "2024-01-15T10:30:01Z",
"position": {
"latitude_deg": 47.3977,
"longitude_deg": 8.5456,
"absolute_altitude_m": 488.5,
"relative_altitude_m": 22.3
},
"battery": {
"voltage_v": 11.8,
"remaining_percent": 87.0
},
"gps": {
"num_satellites": 12,
"fix_type": "FIX_3D"
},
"attitude": {
"roll_deg": 2.1,
"pitch_deg": -1.3,
"yaw_deg": 145.7
},
"flight_mode": "MISSION",
"health": {
"is_gyrometer_calibration_ok": true,
"is_accelerometer_calibration_ok": true,
"is_magnetometer_calibration_ok": true,
"is_home_position_ok": true,
"is_global_position_ok": true
}
}
],
"mission_result": "SUCCESS",
"summary": {
"total_snapshots": 342,
"duration_seconds": 342.0,
"max_altitude_m": 22.5,
"distance_traveled_m": 450.2,
"min_battery_percent": 81.2,
"max_speed_m_s": 5.2,
"collection_rate_hz": 1.0
}
}PYTHONPATH=src .venv/bin/python3 -m tars.phase1.fault_injectorAvailable commands:
| Command | Fault | Effect |
|---|---|---|
1 |
GPS Block | Complete GPS signal loss |
2 |
GPS Noise | Noisy/jumpy GPS readings |
3 |
Battery Drain | Accelerated battery discharge |
4 |
Baro Offset | Incorrect altitude readings |
5 |
Mag Offset | Corrupted compass heading |
6 |
Wind | 8 m/s wind from north with moderate turbulence |
7 |
Restore All | Remove all injected faults |
| Scenario | Inspired By | What Happens |
|---|---|---|
s1 -- GPS Degradation |
NASA Ingenuity | Progressive GPS noise -> block |
s2 -- Altitude Confusion |
Amazon MK30 | Conflicting altitude sensors |
s3 -- Sensor Cascade |
Bell 525 | Multiple sensors fail simultaneously |
s4 -- Wind Shear |
Drone delivery incidents | Progressive crosswind -> severe gust |
Edit .env or set environment variables:
| Variable | Default | Description |
|---|---|---|
PX4_CONNECTION |
udp://:14540 |
MAVSDK connection string |
TELEMETRY_RATE_HZ |
1 |
Snapshots per second |
DRONE_ID |
tars-sim-01 |
Drone identifier |
OUTPUT_DIR |
output |
Telemetry output directory |
MISSION_ID |
auto-generated | Mission identifier |
FAULT_SCENARIO |
(none) | Run a fault scenario during the mission (s1–s4) |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://tars:tars@localhost:5432/tars |
PostgreSQL connection string |
API_HOST |
0.0.0.0 |
FastAPI server host |
API_PORT |
8000 |
FastAPI server port |
| Variable | Default | Description |
|---|---|---|
REDIS_URL |
redis://localhost:6379/0 |
Redis connection string |
PHASE2_API_URL |
http://localhost:8000 |
Phase 2 Replay API base URL |
STATE_API_HOST |
0.0.0.0 |
State API server host |
STATE_API_PORT |
8002 |
State API server port |
| Variable | Default | Description |
|---|---|---|
PHASE3_API_URL |
http://localhost:8002 |
Phase 3 State API base URL |
INCIDENT_API_HOST |
0.0.0.0 |
Incident API server host |
INCIDENT_API_PORT |
8003 |
Incident API server port |
INCIDENT_MAX_GAP_MS |
5000 |
Max gap between matches to merge |
INCIDENT_MIN_STATES |
3 |
Min states for persistence threshold |
INCIDENT_HIGH_RISK |
0.8 |
Risk threshold for immediate incident |
INCIDENT_ELEVATED_RISK |
0.6 |
Risk threshold for elevated detection |
| Variable | Default | Description |
|---|---|---|
PHASE4_API_URL |
http://localhost:8003 |
Phase 4 Incident API base URL |
REASONING_API_HOST |
0.0.0.0 |
Reasoning API server host |
REASONING_API_PORT |
8004 |
Reasoning API server port |
INCIDENT_CLIENT_TIMEOUT |
30.0 |
HTTP client timeout for Phase 4 calls |
GEMINI_API_KEY |
(empty) | Gemini API key (required for live reasoning) |
GEMINI_MODEL |
gemini-2.5-flash |
Gemini model identifier |
GEMINI_TEMPERATURE |
0.1 |
Gemini temperature (low for stable reasoning) |
| Variable | Default | Description |
|---|---|---|
PHOENIX_ENABLED |
true |
Enable/disable Phoenix tracing |
PHOENIX_ENDPOINT |
http://localhost:6006 |
Phoenix OTLP endpoint |
PHOENIX_PROJECT_NAME |
tars-reasoning |
Phoenix project name |
PHOENIX_CONTENT_MODE |
full |
Content capture: full, metadata, disabled |
PHOENIX_EXPORT_TIMEOUT_SECONDS |
5 |
OTLP export timeout |
PHOENIX_BATCH_EXPORT |
true |
Use batch span processor |
| Variable | Default | Description |
|---|---|---|
NEO4J_URI |
bolt://localhost:7687 |
Neo4j Bolt connection URI |
NEO4J_USER |
neo4j |
Neo4j username |
NEO4J_PASSWORD |
(empty) | Neo4j password |
NEO4J_DATABASE |
neo4j |
Neo4j database name |
MEMORY_API_HOST |
0.0.0.0 |
Memory API server host |
MEMORY_API_PORT |
8005 |
Memory API server port |
PHASE2_API_URL |
http://localhost:8000 |
Phase 2 Replay API base URL |
PHASE4_API_URL |
http://localhost:8003 |
Phase 4 Incident API base URL |
PHASE5_API_URL |
http://localhost:8004 |
Phase 5 Reasoning API base URL |
MEMORY_CLIENT_TIMEOUT |
30.0 |
HTTP client timeout for upstream calls |
MEMORY_QUERY_DEFAULT_LIMIT |
20 |
Default result limit for queries |
MEMORY_QUERY_MAX_LIMIT |
100 |
Maximum result limit for queries |
| Variable | Default | Description |
|---|---|---|
PHOENIX_MCP_ENABLED |
false |
Enable Phoenix MCP self-introspection |
PHOENIX_MCP_CONTENT_MODE |
summary |
Content capture: metadata, summary, full_dev, disabled |
PHOENIX_MCP_GRAPHQL_ENDPOINT |
http://localhost:6006/graphql |
Phoenix GraphQL endpoint |
PHOENIX_MCP_MAX_TRACES |
10 |
Maximum traces per query |
PHOENIX_MCP_MAX_SUMMARY_CHARS |
2000 |
Maximum summary length |
PHOENIX_MCP_QUERY_TIMEOUT_S |
5.0 |
Query timeout in seconds |
PHOENIX_MCP_CACHE_TTL_S |
300 |
Cache TTL in seconds |
PHOENIX_MCP_REDACT_SECRETS |
true |
Redact secrets from summaries |
PHOENIX_MCP_ALLOWED_TOOLS |
search_traces,get_trace_summary,compare_traces |
Allowed MCP tools |
PHOENIX_MCP_REQUIRE_REQUEST_FLAG |
true |
Require per-request use_introspection flag |
| Variable | Default | Description |
|---|---|---|
EVALUATION_ENABLED |
true |
Enable the Phase 9 evaluation service |
EVALUATION_DATABASE_URL |
postgresql+asyncpg://tars:tars@localhost:5432/tars |
PostgreSQL connection string |
EVALUATION_VERSION |
v1.0 |
Evaluator version stamped on results |
EVALUATION_API_HOST |
0.0.0.0 |
Evaluation API server host |
EVALUATION_API_PORT |
8006 |
Evaluation API server port |
EVALUATION_BATCH_LIMIT |
50 |
Maximum targets per batch request |
EVALUATION_CONSISTENCY_MIN_CASES |
3 |
Minimum cases for consistency scoring |
EVALUATION_SIMILARITY_LIMIT |
20 |
Maximum similar cases to compare |
EVALUATION_EXPORT_PHOENIX |
false |
Export eval scores to Phoenix |
EVALUATION_REQUIRE_OPERATOR_LABEL |
false |
Require explicit operator labels |
EVALUATION_ROOT_CAUSE_WEIGHT |
0.40 |
Overall score root-cause weight |
EVALUATION_RECOMMENDATION_WEIGHT |
0.35 |
Overall score recommendation weight |
EVALUATION_CONSISTENCY_WEIGHT |
0.15 |
Overall score consistency weight |
EVALUATION_FALSE_POSITIVE_WEIGHT |
0.05 |
Overall score false-positive penalty |
EVALUATION_FALSE_NEGATIVE_WEIGHT |
0.05 |
Overall score false-negative penalty |
| Variable | Default | Description |
|---|---|---|
LEARNING_ENABLED |
true |
Enable the Phase 10 learning service |
LEARNING_DATABASE_URL |
postgresql+asyncpg://tars:tars@localhost:5432/tars |
PostgreSQL connection string |
LEARNING_VERSION |
v1.0 |
Learning engine version stamped on candidates |
LEARNING_API_HOST |
0.0.0.0 |
Learning API server host |
LEARNING_API_PORT |
8007 |
Learning API server port |
LEARNING_MIN_EVALUATED_CASES |
5 |
Minimum evaluated cases to form a pattern |
LEARNING_MIN_DISTINCT_MISSIONS |
3 |
Minimum distinct missions for diversity |
LEARNING_MIN_CONFIDENCE |
0.60 |
Minimum confidence to emit a candidate |
LEARNING_MIN_SUCCESS_RATE |
0.70 |
Minimum success rate for mitigation patterns |
LEARNING_MAX_FALSE_POSITIVE_RATE |
0.20 |
Maximum false-positive rate allowed |
LEARNING_SCORING_SUPPORT_WEIGHT |
0.35 |
Confidence score support weight |
LEARNING_SCORING_OUTCOME_WEIGHT |
0.25 |
Confidence score outcome weight |
LEARNING_SCORING_EVALUATION_WEIGHT |
0.20 |
Confidence score evaluation weight |
LEARNING_SCORING_DIVERSITY_WEIGHT |
0.10 |
Confidence score diversity weight |
LEARNING_SCORING_CONTRADICTION_WEIGHT |
0.10 |
Confidence score contradiction penalty weight |
PHASE9_API_URL |
http://localhost:8006 |
Phase 9 API base URL |
PHASE7_API_URL |
http://localhost:8005 |
Phase 7 API base URL |
LEARNING_TRACE_METADATA_ENABLED |
false |
Enable Phoenix trace metadata enrichment |
PHOENIX_BASE_URL |
http://localhost:6006 |
Phoenix API base URL |
Developed and tested on:
- CPU: Intel i3-6100U (2C/4T @ 2.3GHz)
- RAM: 8GB
- GPU: Intel HD 520 (integrated)
- OS: Pop!_OS 22.04
Gazebo runs in headless mode (no 3D rendering) to fit within RAM constraints. Use QGroundControl on the host for visual drone tracking on a 2D map.
| Phase | Name | Status |
|---|---|---|
| 1 | Mission Foundation (PX4 + Gazebo + MAVSDK) | ✅ Done |
| 2 | Mission Replay System (FastAPI + PostgreSQL) | ✅ Done |
| 3 | State Engine (Python + Redis) | ✅ Done |
| 4 | Incident Engine (Rules + Statistical Detection) | ✅ Done |
| 5 | Gemini Reasoning Layer (Google ADK) | ✅ Done |
| 6 | Phoenix Integration (OpenInference Tracing) | ✅ Done |
| 7 | Neo4j Operational Memory | ✅ Done |
| 8 | Phoenix MCP (Self-Introspection) | ✅ Done |
| 9 | Evaluation Layer (Reasoning Quality Metrics) | ✅ Done |
| 10 | Learning Engine | ✅ Current |
| 11 | Knowledge Validation | Planned |
| 12 | Adaptive Recommendation Engine | Planned |
MIT