Releases: Rocketgraph/rocketgraph
🚀 Rocketgraph v0.1.0 - Compress billions of logs into tiny snapshots to detect anomalies
This release opens up the full Rocketgraph stack: the ML log-clustering and streaming anomaly detection engine, an AI agent that auto-instruments any Node.js service with OpenTelemetry in ~90 seconds, and a license transition to BSL 1.1 → AGPL-3.0 (2030-05-27).
It's the first version where the entire pipeline — from "I have an uninstrumented Node app" to "I'm getting ranked anomaly alerts" — runs end-to-end out of the repo, with no SaaS dependency.
🧠 Rocketgraph ML — clustering + streaming anomaly detection
A stateless FastAPI service (ml/) that points at the observability platform you already pay for, mines structural log templates, and flags the ones that are statistically unusual. Three deterministic algorithms in sequence — no LLM, no labels, fully reproducible:
| Stage | Algorithm | Why |
|---|---|---|
| Template mining | Drain3 (Du & Li, ICSE 2017) | O(N) online parser. ~25k logs/sec/core. Fixed-depth parse tree over masked lines. |
| Per-template scoring | Isolation Forest (Liu et al., 2008) | Unsupervised, linear in N. Features: [log_count, error_rate, warn_rate, unique_services, token_count] per template per service. |
| Streaming detection | Half-Space-Trees (Tan et al., IJCAI 2011) | True online ensemble. Constant memory per service, sub-ms scoring per event. No retraining cycle. |
| 2-D layout | TF-IDF + PCA → MinMax | Templates project to (x, y) ∈ [5, 95] for drop-in scatter plots. |
Measured performance
Single container, 4 vCPU, 8 GB RAM, against a real production-shaped workload:
- 2,002,271 raw logs / 9 services → 58 Drain3 templates → 9 anomalies in ~90s wall-clock. The 90s is bottlenecked on HTTP fetch from ClickHouse, not on the ML.
- Drain3: ~25,000 logs/sec/core.
- Isolation Forest: <50 ms per service for ≤500 templates.
- Half-Space-Trees: sub-millisecond per event.
- Memory footprint: bounded by template count, not log count.
Anomaly signals
Every flagged row carries a reasons array, so downstream alerting can route deterministically:
| reason | trigger |
|---|---|
anomaly_score |
HST score ≥ HST_THRESHOLD (default 0.7) for that service |
new_template |
Drain has never seen this template for that service, level ≥ warn |
error_burst |
≥ 60% errors in the last 60s for that service |
Connectors
Six platforms supported in this release, all returning the same row shape {timestamp, message, level, service} so the downstream pipeline is identical regardless of source:
- New Relic (NerdGraph / NRQL)
- Grafana Loki (LogQL, bearer auth)
- Datadog (API + App key)
- AWS CloudWatch Logs (IAM, log group + optional stream)
- Sentry (user auth token, org + project)
- ClickHouse (HTTP basic auth, configurable column mapping)
OpenTelemetry isn't a connector — route OTLP into ClickHouse or Loki via a standard collector config, then point Rocketgraph at that. Minimal collector config is in ml/README.md.
Roadmap: Splunk, Elastic / OpenSearch, Azure Monitor, GCP Cloud Logging.
HTTP API
| Method | Endpoint | Purpose |
|---|---|---|
GET |
/clusters |
Cluster a window of logs. Returns templates with anomaly scores and 2-D coords. |
POST |
/clusters/train |
Same as /clusters, plus warms the per-service HST model on the same window. |
POST |
/anomalies/detect |
Score new logs (inline JSON or fetched via a connector) against the trained HST. Returns only the anomalous rows. |
POST |
/credentials |
Dynamic credentials — override .env at runtime, per-tenant, no restart. |
GET |
/credentials |
Inspect which sources are configured. |
POST |
/detector/reset |
Wipe the trained HST. |
GET |
/health |
Liveness. |
Time-window flags: 1h, 6h, 12h, 24h, 1d, 7d, or absolute start=<ISO>&end=<ISO> (ClickHouse).
Tunables
| Env var | Default | Purpose |
|---|---|---|
DRAIN_SIM_TH |
0.4 |
Drain3 similarity threshold (lower → fewer, broader templates). |
ANOMALY_CONTAMINATION |
0.1 |
Isolation Forest expected anomaly fraction. |
HST_THRESHOLD |
0.7 |
Half-Space-Trees anomaly cutoff. |
DEFAULT_LOOKBACK_HOURS |
6 |
Used when window is omitted. |
MAX_ROWS |
100000 |
Per-fetch cap — bump for high-volume training windows. |
Deployment
- Single stateless container.
docker compose upis the whole install. No DB, no auth provider, no agents on hosts. - VPC-only. No outbound traffic except to the connectors you configure. The container does not phone home.
- Secrets.
.envfor static creds,POST /credentialsfor dynamic/per-tenant creds. Held in memory only — never written to disk or logs. - Optional signing middleware.
SIGNING_SECRETgates every endpoint behind anX-Signing-Secretheader. - Air-gap-ready. Designed for FedRAMP / HIPAA / SOC2-style environments.
⚡ @rgraph/otel-node — AI agent for OpenTelemetry instrumentation
Most teams want anomaly detection but lack the upstream pipeline that produces structured telemetry in the first place. @rgraph/otel-node (packages/otel-node) closes that gap.
export ROCKETGRAPH_API_KEY=rg_live_xxxxxxxxxxxx
cd ~/your-node-service
npx @rgraph/otel-node initWhat it does on that single command:
- Reads
package.jsonand the lockfile to detect framework, language (TS/JS), and package manager (npm/yarn/pnpm/bun). - Scans the dependency tree to find every HTTP, DB, queue, and cache client in use.
- Writes (or merges into, with a
.bak) aninstrumentation.ts/.jstailored to that exact stack. - Installs the required
@opentelemetry/*packages using the detected package manager. - Prints the exact
--require/--importflag to wire the file in. For Next.js, picks upexperimental.instrumentationHookautomatically.
Detected stacks
| Category | Detected |
|---|---|
| Frameworks | Express, Fastify, NestJS, Koa, Hapi, Restify, Next.js, Nuxt |
| HTTP / RPC | http, https, grpc, @grpc/grpc-js |
| Databases | pg, mysql2, mongodb, mongoose, redis, ioredis, prisma |
| Queues / cloud | amqplib, kafkajs, aws-sdk, @aws-sdk/* |
| Package managers | npm, yarn, pnpm, bun (auto from lockfile) |
| Languages | TypeScript, JavaScript |
Modes
- Agent mode (default) — Claude-powered, reads your code, writes the right instrumentation file. No templates, no manual
@opentelemetry/instrumentation-*package selection. - Legacy template mode (
--legacy) — deterministic, no LLM. Use this in CI where reproducibility matters. --dry-run --legacy— print the file that would be written + the packages that would be installed. No changes.otel-node instrument— agent goes further: adds structured error handlers, span attributes, and observability code throughout the app.otel-node detect— JSON report of what the detector sees. Useful for debugging or auditing in CI.otel-node uninstall— removes the generated file and.bak. Leaves OTel packages installed.
Where it fits
Your Node service OTel Collector Sink Rocketgraph ML
───────────────── ───────────── ──── ──────────────
@rgraph/otel-node init ──> OTLP HTTP / gRPC ──> ClickHouse ──> /clusters?source=clickhouse
(writes instrumentation) (or any platform) Loki /anomalies/detect
Datadog
New Relic
The agent only owns the leftmost step — getting telemetry out of your service over standards-compliant OTLP. The right half is whatever observability platform you already pay for. No proprietary protocol, no custom SDK, no lock-in.
🎨 Rebrand to Rocketgraph
Project consolidated under the Rocketgraph name (formerly rgraph / RocketsGraphQL). New logo, new domain at rocketgraph.app, new docs structure. Engine and APIs unchanged — only branding and packaging.
The README now ships a real demo GIF (images/logs-snapshot.gif) showing the 2M-logs → 58-templates pipeline in motion, so you can see what the engine actually does without cloning anything.
📜 License change — BSL 1.1, converts to AGPL-3.0 on 2030-05-27
Rocketgraph is moving from AGPL-3.0 to Business Source License 1.1 with a Change Date of 2030-05-27, at which point the license converts to AGPL-3.0 automatically.
What you can do under BSL today (free, no cost):
- Use Rocketgraph in production, internally, at any scale.
- Modify it, fork it, embed it in your stack.
- Self-host it inside your own infrastructure.
What you can't do under BSL today (until the Change Date):
- Offer Rocketgraph as a competing hosted / managed service to third parties.
On 2030-05-27, the BSL terms expire and every version up to that point becomes AGPL-3.0. No action required from you. Same protection model the AGPL gave you, with a 4-year window of source-available commercial protection in front of it.
The license shift is the standard "MariaDB / Sentry / CockroachDB" pattern — designed to keep the project self-hostable and auditable while leaving room to fund continued development.
Full text: LICENSE.txt.
🔗 Community
- 💬 Discord moved — new permanent invite: https://discord.gg/dqwkEpSc. The old
YHVnZ5WTinvite is deprecated. - 🐛 GitHub Issues — bugs and feature requests.
- 🐦 @RGraphql — release note...