Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Version control
.git/
.gitignore

# Python artifacts
.venv/
__pycache__/
*.pyc
*.pyo
*.pyd
.pytest_cache/
*.egg-info/
dist/
build/
.mypy_cache/
.ruff_cache/

# Tests and fixtures — not needed in the image
tests/

# Dev and internal docs
.docs/
.claude/

# Infrastructure-as-code — not part of the application
infra/

# Runtime data — always mounted at runtime, never baked in
logs/
data/
conf.yaml

# Secrets — never in the image
.env
.env.*
96 changes: 74 additions & 22 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,45 +1,97 @@
# ──────────────────────────────────────────────────────────────────────────────
# ARIA — Secrets template
# ARIA — Secrets and runtime env vars template
# Copy this file to .env and fill in your values. Never commit .env to Git.
# Non-secret configuration (model IDs, connector types, GCP settings, etc.)
# lives in conf.yaml — see conf_template.yaml.
# lives in conf.yaml — see conf_template.yaml for all options.
# ──────────────────────────────────────────────────────────────────────────────


# ── Required (all deployments) ────────────────────────────────────────────────

# ServiceNow — password for the service account defined in conf.yaml (servicenow.user)
SNOW_PASSWORD=<your-servicenow-password>

# Anthropic — API key for LLM calls across all agents
# Reference implementation uses Anthropic. Swap for your provider if you bring
# your own LLMClientInterface implementation.
# Slack — bot token with chat:write scope (channel set in conf.yaml slack.channel_id)
SLACK_BOT_TOKEN=<your-slack-bot-token>


# ── LLM provider — set ONE block depending on llm.provider in conf.yaml ───────

# --- anthropic (llm.provider: anthropic) — recommended for non-GCP deployments
ANTHROPIC_API_KEY=<your-anthropic-api-key>

# Slack — bot token with chat:write scope, for the channel defined in conf.yaml (slack.channel_id)
SLACK_BOT_TOKEN=<your-slack-bot-token>
# --- vertex_ai (llm.provider: vertex_ai) — GCP container deployments (no API key needed)
# Auth is via ADC — set GOOGLE_APPLICATION_CREDENTIALS if not running on GKE/Cloud Run.
# GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json # only if ADC is not auto-resolved
VERTEX_AI_PROJECT_ID=<your-gcp-project-id>
VERTEX_AI_LOCATION=europe-west1

# MS Teams (optional — alternative notifier; swap connector in dependencies.py)
TEAMS_WEBHOOK_URL=<your-teams-incoming-webhook-url>
# --- claude_code (llm.provider: claude_code) — local dev only, NOT for production (#84)
# No additional env vars needed; uses the local Claude Code CLI subscription.


# ── Vault backend — set ONE block depending on runtime.vault_backend in conf.yaml ─

# --- env (default) — secrets come from this .env file; no additional config needed.

# --- gcp — GCP Secret Manager via ADC
GCP_PROJECT_ID=<your-gcp-project-id>

# --- hashicorp — HashiCorp Vault
VAULT_TOKEN=<your-vault-token>

# --- aws — AWS Secrets Manager
AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>

# Google Chat (optional — alternative notifier; swap connector in dependencies.py)
GOOGLE_CHAT_WEBHOOK_URL=<your-google-chat-webhook-url>

# ── Pipeline behaviour ────────────────────────────────────────────────────────

# Path to a custom conf.yaml (default: ./conf.yaml relative to working directory)
# ARIA_CONFIG_PATH=/etc/aria/conf.yaml

# Enable the built-in Alpine.js ops dashboard at /dashboard
# ARIA_DASHBOARD_ENABLED=true

# Operating mode: inform | hitm | autonomous (only 'inform' is implemented in Phase 1.5)
# ARIA_OPERATING_MODE=inform

# LLM provider override — overrides llm.provider in conf.yaml
# ARIA_LLM_PROVIDER=anthropic

# Vault backend override — overrides runtime.vault_backend in conf.yaml
# ARIA_VAULT_BACKEND=env

# Log format: human (coloured, for terminals) | json (for log aggregators)
# ARIA_LOG_FORMAT=human

# Log directory for rolling file output
# ARIA_LOG_DIR=logs/

# SQLite run history database path
# ARIA_RUN_DB_PATH=data/runs.db

# Dry-run mode — uses in-memory stubs; no real ServiceNow/Slack/SSH calls
# ARIA_DRY_RUN=false


# ── Optional connectors ───────────────────────────────────────────────────────

# CDP — SSH private key PEM content for Agent 2 log extraction from Cloudera CDP nodes
# Set via: export CDP_SSH_KEY="$(cat /path/to/private_key)"
CDP_SSH_KEY=<pem-content-of-ssh-private-key>

# CDP — SSH host public key for strict host verification (recommended, prevents MITM attacks)
# CDP — SSH host public key for strict host verification (prevents MITM attacks)
# Format: "<key-type> <base64-encoded-public-key>" e.g. "ssh-ed25519 AAAA..."
# If not set, ARIA falls back to WarningPolicy (logs a warning but still connects)
# Leave empty to use WarningPolicy (logs a warning but still connects)
CDP_HOST_KEY=

# GCP — service account JSON key (base64-encoded) for BigQuery and GCS access
# Only required when connectors.log = gcp in conf.yaml
# GCP — service account JSON key (base64-encoded) for Cloud Logging / BigQuery access
# Only needed if NOT using ADC (e.g. running outside GCP with a SA key file)
GCP_SA_KEY=<base64-encoded-service-account-json>

# AWS — credentials for the AWS Secrets Manager vault implementation
# Only required if you are using the AWS SM vault backend
AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>
# MS Teams (optional alternative notifier — swap connector in dependencies.py)
TEAMS_WEBHOOK_URL=<your-teams-incoming-webhook-url>

# HashiCorp Vault — token for the Vault vault implementation
# Only required if you are using the Vault backend
VAULT_TOKEN=<your-vault-token>
# Google Chat (optional alternative notifier — swap connector in dependencies.py)
GOOGLE_CHAT_WEBHOOK_URL=<your-google-chat-webhook-url>
28 changes: 28 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,31 @@ jobs:

- name: pytest (unit)
run: pytest tests/unit/ -v

docker-smoke:
name: Docker build + smoke test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Build image
run: docker build -t aria:ci .

- name: Smoke test
run: |
docker run -d --name aria-ci \
-e ARIA_DRY_RUN=true \
-e ARIA_LLM_PROVIDER=anthropic \
-e ANTHROPIC_API_KEY=dummy \
-p 8000:8000 aria:ci
# Wait for the API to boot (health check starts after start_period).
for i in $(seq 1 15); do
if curl -sf http://localhost:8000/api/v1/health; then
echo "Health check passed"
break
fi
echo "Waiting... ($i/15)"
sleep 2
done
curl -sf http://localhost:8000/api/v1/health
docker stop aria-ci
29 changes: 29 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM python:3.11-slim

# curl is needed for the HEALTHCHECK command below.
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Create a non-root user before copying any files.
RUN adduser --disabled-password --uid 1000 aria

# Install dependencies first so this layer is cached when only source changes.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application source.
COPY . .

RUN chown -R aria:aria /app
USER aria

EXPOSE 8000

# Health check hits the /health endpoint — fails fast if the API is down.
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8000/api/v1/health || exit 1

CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,51 @@ uvicorn api.main:app --reload

---

## Deployment

ARIA ships as a single Docker image. No Python installation is required on the target machine — only Docker (local/VM) or a Kubernetes cluster (production). The same image works across all environments; what changes is how `conf.yaml` and secrets are injected.

### Docker (local machine or VM)

```bash
# 1. Build
docker build -t aria:latest .

# 2. Run — mount your conf.yaml; pass secrets as env vars
docker run -d \
--name aria \
-p 8000:8000 \
-v /path/to/conf.yaml:/etc/aria/conf.yaml:ro \
-e ARIA_CONFIG_PATH=/etc/aria/conf.yaml \
-e SNOW_PASSWORD=<your-password> \
-e ANTHROPIC_API_KEY=<your-key> \
-e SLACK_BOT_TOKEN=<your-token> \
aria:latest

# 3. Verify
curl http://localhost:8000/api/v1/health
```

### Kubernetes

`conf.yaml` is delivered via a ConfigMap; secrets via a Kubernetes Secret or GCP Secret Manager (Workload Identity, no API key in the pod):

```bash
kubectl create namespace aria
kubectl create configmap aria-config --from-file=conf.yaml=./conf.yaml -n aria
kubectl create secret generic aria-secrets \
--from-literal=SNOW_PASSWORD=<pw> \
--from-literal=ANTHROPIC_API_KEY=<key> \
--from-literal=SLACK_BOT_TOKEN=<token> \
-n aria
```

Then apply a Deployment that mounts the ConfigMap at `/etc/aria/conf.yaml` and sets `ARIA_CONFIG_PATH=/etc/aria/conf.yaml`. For GCP clusters, set `llm.provider: vertex_ai` and `runtime.vault_backend: gcp` in `conf.yaml` — the pod authenticates via Workload Identity with no credentials in the container.

**Full guide** (conf.yaml preparation, docker-compose, GKE Deployment + Service YAML, LLM provider selection, vault backend options): [documentation/guides/installation.md](documentation/guides/installation.md)

---

## Acceptance criteria (Phase 1)

Phase 1 is complete when all of the following pass on 10 consecutive test incidents:
Expand Down Expand Up @@ -580,8 +625,8 @@ Phase 1 is complete when all of the following pass on 10 consecutive test incide
| Phase 1 | S8: ReAct loop trigger — cross-service log requests | ✅ Done |
| Phase 1 | M7: Acceptance criteria validated on local environment | ✅ Done |
| Phase 1.5 | S1: Structured logging — structlog, `run_id`, lifecycle events, RunRecord | ✅ Done |
| **Phase 1.5** | **S2: Monitoring foundation — run store, REST API, Alpine.js dashboard, mode scaffold** | 🔜 Next |
| Phase 1.5 | S3: Docker + `ARIA_CONFIG_PATH` + `VertexAILLMClient` + LLM provider DI | 🔜 Planned |
| Phase 1.5 | S2: Monitoring foundation — run store, REST API, Alpine.js dashboard, mode scaffold | ✅ Done |
| Phase 1.5 | S3: Docker + `ARIA_CONFIG_PATH` + `VertexAILLMClient` + LLM provider DI (incl. #84 security fix) | ✅ Done |
| Phase 1.5 | S4: Testing infrastructure — UC1/UC2/UC3 cluster wiring, KB runbooks, CMDB validation | 🔜 Planned |
| Phase 1.5 | S5: Round 2 acceptance testing — 30 incidents on UC1 + UC2 real infrastructure | 🔜 Planned |
| Phase 1.5 | S6: GCP native connectors — BQ, Cloud Functions, Pub/Sub, GCS | 🔜 Planned |
Expand Down
Loading
Loading