Keystone is a minimal, robust, and secure edge orchestration agent written in Go. It manages local components (native processes and containers), executes deployments atomically with rollback, and keeps devices converging to a desired state — even with flaky networks.
Why Keystone? Because edge fleets need something that is lightweight, predictable, and operable without dragging a full container stack everywhere. Keystone embraces "processes first, containers when needed," with a clean, pluggable design inspired by Greengrass. When you need containers, Keystone supports containerd natively or falls back to Docker/nerdctl/podman CLI.
- Lightweight: idle CPU ~0%, small RAM baseline (<40MB)
- Solid: atomic deployments, checkpoints, rollback, exponential backoff
- Secure: mTLS, artifact signatures (ECDSA/RSA), checksums
- Portable: Linux x86/ARM, single binary, no mandatory Docker/CRI
- Connected: HTTP REST, NATS (+ JetStream), MQTT adapters
- Operable: structured logs, Prometheus metrics, health endpoints, persistence
| Feature | AWS Greengrass | Keystone |
|---|---|---|
| Runtime | Java (JVM) / C (Lite) | Go (Native) |
| RAM Baseline | ~100MB+ | < 40MB |
| Complexity | High (Cloud-first) | Low (Lean & Simple) |
| Setup | Heavy Bootstrap | Single Binary |
| Control Plane | AWS IoT Core only | HTTP, NATS, MQTT |
| Offline Mode | Limited | Full (JetStream jobs) |
Keystone uses simple TOML recipes to define components and deployment plans to keep your devices in sync.
Describe how to install and run your process in com.example.hello.recipe.toml:
[metadata]
name = "com.example.hello"
version = "1.0.0"
[[artifacts]]
uri = "https://example.com/artifacts/hello.tar.gz"
sha256 = "..."
unpack = true
[lifecycle.run.exec]
command = "./hello"
args = ["--interval", "30s"]Run containerized workloads when needed (uses containerd or docker/nerdctl/podman):
[metadata]
name = "com.example.nginx"
version = "1.0.0"
[lifecycle.run]
type = "container"
restart_policy = "always"
[lifecycle.run.container]
image = "docker.io/library/nginx:alpine"
pull_policy = "if-not-present"
network_mode = "bridge"
[[lifecycle.run.container.mounts]]
source = "/data/nginx/html"
target = "/usr/share/nginx/html"
read_only = true
[[lifecycle.run.container.ports]]
host_port = 8080
container_port = 80
[lifecycle.run.container.resources]
memory_mb = 256
cpu_shares = 512
[lifecycle.run.health]
check = "http://localhost:8080/"
interval = "10s"For complete container documentation, see docs/containers.md.
List the components you want to run in plan.toml:
[[components]]
name = "hello"
recipe = "com.example.hello.recipe.toml"Deploy it with a single command:
./keystonectl apply plan.tomlSee everything running at a glance:
./keystonectl statusMVP Complete. Keystone is ready for production evaluation. The agent includes a complete supervisor, process runner, artifact manager, deployment engine, and multiple control plane adapters (HTTP, NATS, MQTT). See the roadmap below for upcoming features.
go run ./cmd/keystone --http :8080For development with live reload (requires air):
task devProbe the health endpoint:
curl -s localhost:8080/healthz | jqYou should see a JSON response with status and uptime.
Run the built-in demo stack (db -> cache -> api):
go run ./cmd/keystone --demoScrape metrics (Prometheus format):
curl -s localhost:8080/metrics | head- Phase 0: Agent skeleton, config base, /healthz, persistent state snapshotting
- Phase 1: Supervisor + ProcessRunner, lifecycle hooks, health checks (HTTP/TCP/Shell)
- Phase 2: DAG-based deployments, layer-wise rollback, Prometheus metrics
- Phase 3: Security hardening — mTLS adapters, artifact signatures (ECDSA/RSA)
- Phase 4: Control plane adapters — HTTP REST, NATS (+ JetStream), MQTT
- Phase 5: Robustness — download resume, exponential backoff, graceful shutdown
- Phase 6: ContainerRunner — containerd client, CLI fallback (docker/nerdctl/podman)
- Phase 7: Self-update and canary rings
See KeyStone.md for the architecture proposal and delivery plan.
| Category | Features |
|---|---|
| Supervisor | DAG execution, parallel layer startup, FSM lifecycle, dependency ordering |
| ProcessRunner | Process management, log streaming, health probes (HTTP/TCP/cmd), restart policies, exponential backoff |
| ContainerRunner | containerd client, CLI fallback (docker/nerdctl/podman), image pull, mounts, ports, resource limits |
| Deployment Engine | TOML plans and recipes, environment variable substitution, dry-run mode |
| Artifact Manager | Secure download with resume, SHA-256 verification, detached signatures, GC, cache limits |
| Security | Trust bundles (PEM), ECDSA/RSA signature verification, mTLS support |
| Observability | Prometheus metrics, structured logging, health endpoints, per-process metrics |
| Persistence | Automatic state snapshotting, recovery on restart, atomic writes |
| Control Plane | HTTP REST API, NATS adapter (+ JetStream jobs), MQTT adapter (QoS, LWT) |
| Robustness | Download resume (HTTP Range), exponential backoff with jitter, context propagation, graceful shutdown |
Use a minimal TOML plan to run real processes via the ProcessRunner.
Example plan (see configs/examples/plan.toml):
[[components]]
name = "keystone-server"
recipe = "configs/examples/com.keystone.server.recipe.toml"Apply it:
# 1. Start the agent (now remote-first)
task build
./keystone --http :8080
# 2. Apply the plan remotely using the CLI
./keystonectl apply configs/examples/plan.tomlNotes:
- The example recipe uses the built-in
keystoneserverbinary. - Artifact management and detached signatures are supported but optional for this simple example.
- ProcessRunner applies basic
RLIMIT_NOFILE; cgroups integration is a safe no-op placeholder for now.
- Start agent:
./keystone --http :8080
- Apply plan:
./keystonectl apply configs/examples/plan.toml
- Health and discovery:
curl -s localhost:8080/healthz | jqcurl -s localhost:8080/v1/components | jqcurl -s localhost:8080/v1/plan/status | jq
- Stop all components:
curl -X POST localhost:8080/v1/plan/stop -i
- Stop or restart a single component:
curl -X POST localhost:8080/v1/components/hello:stop -icurl -X POST localhost:8080/v1/components/hello:restart -i
- Metrics (Prometheus):
curl -s localhost:8080/metrics | head
- Provide a trust bundle (PEM) via
KEYSTONE_TRUST_BUNDLEand a leaf certificate viaKEYSTONE_LEAF_CERT, or includecert_uriin the recipe. - Add
sig_urito each artifact entry in the recipe to enable signature verification. - Signature format: detached signature over SHA-256 of the artifact, produced with OpenSSL (
openssl dgst -sha256 -sign ...). - See
configs/trust/README.mdfor a quick, dev-friendly CA and signing walkthrough.
- Configure per-artifact HTTP headers directly in the recipe under
[[artifacts]].headers. - For GitHub artifacts (
github.comorapi.github.com), setgithub_token(at the same level asuri) to injectAuthorization: Bearer <token>when noAuthorizationheader is provided.
Example snippet inside a recipe:
[[artifacts]]
uri = "https://api.github.com/repos/org/repo/actions/artifacts/123/zip"
sha256 = "sha256:<...>"
unpack = true
github_token = ""
[artifacts.headers]
Accept = "application/vnd.github+json" # para endpoint de Actions /artifacts/{id}/zip (302 hacia S3)Build and use the local CLI for convenience:
go build -o keystonectl ./cmd/keystonectl
./keystonectl status
./keystonectl components
./keystonectl stop-plan
./keystonectl stop hello
./keystonectl restart hello
./keystonectl graph
./keystonectl restart-dry hello
./keystonectl apply-dry configs/examples/plan.tomlA simple HTTP server to serve local artifacts for testing:
go run ./cmd/keystoneserver --root ./artifacts --addr :9000- Accessible at
http://localhost:9000/<path> - Includes a
/healthzendpoint.
We provide a Bruno collection for testing the agent API.
- Install Bruno.
- Open the app and select Open Collection.
- Select the
bruno/folder in this repository. - Use the local environment to set the
base_url.
Graph and dry-run from API directly:
curl -s localhost:8080/v1/plan/graph | jq
curl -s -X POST localhost:8080/v1/components/hello:restart?dry=true | jq
curl -s -X POST localhost:8080/v1/plan/apply -H 'Content-Type: application/json' \
-d '{"planPath":"configs/examples/plan.toml","dry":true}'Keystone uses GoReleaser for automated builds and releases. To trigger a new release:
- Tag the commit:
git tag -a v0.1.0 -m "Release v0.1.0" - Push the tag:
git push origin v0.1.0
The GitHub Action will automatically build the binaries for multiple architectures (amd64, arm64, armv7) and create a GitHub Release with the artifacts.
Keystone supports loading environment variables from a .env file in the current working directory.
| Variable | Description |
|---|---|
KEYSTONE_ARTIFACT_CACHE_LIMIT_BYTES |
Max size of runtime/artifacts (default: 2GiB). |
KEYSTONE_ARTIFACT_DOWNLOAD_TIMEOUT |
Artifact download timeout (default: 30m). Supports "5m", "1h", etc. |
KEYSTONE_TRUST_BUNDLE |
Path to CA trust bundle (PEM) for signature verification. |
KEYSTONE_LEAF_CERT |
Default certificate (PEM) for signature verification if not in recipe. |
KEYSTONE_GITHUB_TOKEN |
Default token for GitHub artifact downloads (if not in recipe). |
KEYSTONE_DEVICE_ID |
Device ID for NATS/MQTT topics (default: hostname). |
KEYSTONE_MQTT_BROKER |
MQTT broker URL (enables MQTT if set and --mqtt-broker not passed). |
KEYSTONE_MQTT_DEVICE_ID |
MQTT device ID (overrides KEYSTONE_DEVICE_ID for MQTT only). |
KEYSTONE_MQTT_CLIENT_ID |
MQTT client ID. |
KEYSTONE_MQTT_TLS_CERT |
Path to MQTT client TLS certificate. |
KEYSTONE_MQTT_TLS_KEY |
Path to MQTT client TLS private key. |
KEYSTONE_MQTT_TLS_CA |
Path to MQTT CA certificate bundle. |
KEYSTONE_MQTT_TLS_VERIFY |
Verify MQTT broker certificate (true/false). |
KEYSTONE_MQTT_USER |
MQTT username. |
KEYSTONE_MQTT_PASS |
MQTT password. |
KEYSTONE_MQTT_QOS |
MQTT QoS for command/response (0, 1, 2). |
KEYSTONE_MQTT_STATE_INTERVAL |
MQTT state event interval (e.g. 10s, 0 to disable). |
KEYSTONE_MQTT_HEALTH_INTERVAL |
MQTT health event interval (e.g. 30s, 0 to disable). |
KEYSTONE_INSTALL_TIMEOUT |
Install phase timeout (default: 2m). Supports duration strings. |
KEYSTONE_CONTAINERD_SOCKET |
containerd socket path (default: /run/containerd/containerd.sock). |
KEYSTONE_CONTAINERD_NAMESPACE |
containerd namespace for containers (default: keystone). |
KEYSTONE_CONTAINER_SNAPSHOTTER |
Snapshotter for container images (default: overlayfs). |
KEYSTONE_CONTAINER_REGISTRY |
Default container registry (default: docker.io). |
KEYSTONE_CNI_CONF_DIR |
CNI config directory for bridge mode (default: /etc/cni/net.d). |
KEYSTONE_CNI_PLUGIN_DIRS |
CNI plugin dirs (default: /opt/cni/bin:/usr/lib/cni). |
KEYSTONE_CNI_NETNS_DIR |
Netns dir for bridge mode (default: /var/run/netns). |
Keystone features a robust artifact download system designed for unreliable edge networks:
| Feature | Description |
|---|---|
| Resume Support | Automatic resume via HTTP Range headers if download is interrupted |
| Exponential Backoff | Retries with jitter to avoid thundering herd (1s-30s, 10 attempts) |
| Progress Tracking | Real-time download progress with speed and ETA |
| Timeout Control | Configurable connect, read, and overall timeouts |
| Atomic Operations | Downloads to .partial file, renamed on completion |
| SHA-256 Verification | Post-download integrity check |
| Error Classification | Distinguishes fatal (4xx) from retryable (5xx, network) errors |
| Rate Limit Handling | Respects Retry-After headers for 429 responses |
Keystone uses a pluggable adapter architecture for control plane communication. Multiple adapters can run simultaneously.
| Adapter | Protocol | Use Case | Default |
|---|---|---|---|
| HTTP | REST API | Local management, debugging, Prometheus | Enabled (:8080) |
| NATS | Pub/Sub | Cloud-scale fleet management | Disabled |
| MQTT | IoT messaging | AWS IoT Core, edge gateways | Disabled |
For complete adapter documentation, see docs/adapters.md.
The HTTP adapter exposes a REST API for local management:
# Default: enabled on port 8080
./keystone --http :8080
# Disable HTTP (use only messaging adapters)
./keystone --http "" --nats-url nats://server:4222 --nats-device-id edge-001Key Endpoints:
| Endpoint | Description |
|---|---|
GET /healthz |
Health check |
GET /metrics |
Prometheus metrics |
GET /v1/components |
List components |
POST /v1/plan/apply |
Apply deployment plan |
POST /v1/components/{name}:restart |
Restart component |
Enable NATS for asynchronous fleet management with optional JetStream persistence:
# Basic NATS
./keystone --http :8080 \
--nats-url nats://control-plane:4222 \
--nats-device-id edge-001
# With mTLS and JetStream
./keystone --http :8080 \
--nats-url nats://control-plane:4222 \
--nats-device-id edge-001 \
--nats-tls-cert /etc/keystone/certs/client.crt \
--nats-tls-key /etc/keystone/certs/client.key \
--nats-tls-ca /etc/keystone/certs/ca.crt \
--nats-jetstreamKey Features:
- mTLS, NKey, Token, User/Pass authentication
- JetStream for durable job queues (survives disconnections)
- Subjects:
keystone.{deviceId}.cmd.*,keystone.{deviceId}.events.*
Authentication Priority: NKey > Credentials > Token > User/Pass
Enable MQTT for IoT-friendly communication with brokers like Mosquitto, EMQX, or AWS IoT Core:
# Basic MQTT
./keystone --http :8080 \
--mqtt-broker tcp://broker:1883 \
--mqtt-device-id edge-001
# With TLS and auth
./keystone --http :8080 \
--mqtt-broker ssl://broker:8883 \
--mqtt-device-id edge-001 \
--mqtt-tls-ca /etc/keystone/certs/ca.crt \
--mqtt-user agent --mqtt-pass secretKey Features:
- mTLS, User/Pass authentication
- Configurable QoS (0, 1, 2)
- Last Will and Testament for online/offline detection
- Topics:
keystone/{deviceId}/cmd/*,keystone/{deviceId}/resp/*,keystone/{deviceId}/events/*
# HTTP + NATS + MQTT simultaneously
./keystone --http :8080 \
--nats-url nats://nats.internal:4222 --nats-device-id edge-001 \
--mqtt-broker tcp://mqtt.internal:1883 --mqtt-device-id edge-001HTTP Adapter Flags
| Flag | Default | Description |
|---|---|---|
--http |
:8080 |
HTTP listen address (empty to disable) |
NATS Adapter Flags
| Flag | Default | Description |
|---|---|---|
--nats-url |
(empty) | NATS server URL (empty to disable) |
--nats-device-id |
hostname | Device ID for subjects |
--nats-tls-cert |
(empty) | Client TLS certificate path |
--nats-tls-key |
(empty) | Client TLS key path |
--nats-tls-ca |
(empty) | CA certificate path |
--nats-tls-verify |
true |
Verify server certificate |
--nats-creds |
(empty) | Credentials file path (.creds) |
--nats-nkey |
(empty) | NKey seed file path |
--nats-token |
(empty) | Authentication token |
--nats-user |
(empty) | Username |
--nats-pass |
(empty) | Password |
--nats-state-interval |
10s |
State event interval (0 to disable) |
--nats-health-interval |
30s |
Health event interval (0 to disable) |
--nats-jetstream |
false |
Enable JetStream |
--nats-js-stream |
KEYSTONE_JOBS |
JetStream stream name |
--nats-js-workers |
1 |
Job processor workers |
MQTT Adapter Flags
| Flag | Default | Description |
|---|---|---|
--mqtt-broker |
(empty) | MQTT broker URL (empty to disable) |
--mqtt-device-id |
hostname | Device ID for topics |
--mqtt-client-id |
keystone-{device-id} |
MQTT client ID |
--mqtt-tls-cert |
(empty) | Client TLS certificate path |
--mqtt-tls-key |
(empty) | Client TLS key path |
--mqtt-tls-ca |
(empty) | CA certificate path |
--mqtt-tls-verify |
true |
Verify server certificate |
--mqtt-user |
(empty) | Username |
--mqtt-pass |
(empty) | Password |
--mqtt-qos |
1 |
QoS level (0, 1, 2) |
--mqtt-state-interval |
10s |
State event interval (0 to disable) |
--mqtt-health-interval |
30s |
Health event interval (0 to disable) |
Environment variable equivalents are also supported (flags take precedence):
KEYSTONE_MQTT_BROKER, KEYSTONE_MQTT_DEVICE_ID, KEYSTONE_MQTT_CLIENT_ID,
KEYSTONE_MQTT_TLS_CERT, KEYSTONE_MQTT_TLS_KEY, KEYSTONE_MQTT_TLS_CA,
KEYSTONE_MQTT_TLS_VERIFY, KEYSTONE_MQTT_USER, KEYSTONE_MQTT_PASS,
KEYSTONE_MQTT_QOS, KEYSTONE_MQTT_STATE_INTERVAL, KEYSTONE_MQTT_HEALTH_INTERVAL.
Run once after cloning to enable the repo’s versioned hooks:
task hooks # or: ./scripts/setup-git-hooks.shThis sets core.hooksPath to .githooks, where the pre-commit hook runs go fmt ./... and stages formatting changes automatically.
| Concept | Description |
|---|---|
| Recipe | TOML file describing a component: artifacts, lifecycle hooks, health checks, resources |
| Deployment Plan | TOML file listing components to run, resolved as a DAG with dependencies |
| Supervisor | Enforces lifecycle (install → start → running → stop) and restart policies |
| Runner | Executes components: ProcessRunner (native) or ContainerRunner (containerd/CLI) |
| Artifact Manager | Downloads, verifies (SHA-256 + signatures), caches, and garbage collects artifacts |
| Adapter | Pluggable control plane interface (HTTP, NATS, MQTT) for remote management |
See configs/systemd/keystone.service for a hardened example unit file.
We welcome contributors who care about reliability at the edge.
- Issues: bug reports, design discussions, small enhancements
- PRs: focused, well-tested changes; prefer conventional commits
- Security: please report privately first when appropriate
📃 Apache-2.0
