Skip to content

Releases: Rocketgraph/rocketgraph

🚀 Rocketgraph v0.1.0 - Compress billions of logs into tiny snapshots to detect anomalies

27 May 14:03

Choose a tag to compare

This release opens up the full Rocketgraph stack: the ML log-clustering and streaming anomaly detection engine, an AI agent that auto-instruments any Node.js service with OpenTelemetry in ~90 seconds, and a license transition to BSL 1.1 → AGPL-3.0 (2030-05-27).

It's the first version where the entire pipeline — from "I have an uninstrumented Node app" to "I'm getting ranked anomaly alerts" — runs end-to-end out of the repo, with no SaaS dependency.


🧠 Rocketgraph ML — clustering + streaming anomaly detection

A stateless FastAPI service (ml/) that points at the observability platform you already pay for, mines structural log templates, and flags the ones that are statistically unusual. Three deterministic algorithms in sequence — no LLM, no labels, fully reproducible:

Stage Algorithm Why
Template mining Drain3 (Du & Li, ICSE 2017) O(N) online parser. ~25k logs/sec/core. Fixed-depth parse tree over masked lines.
Per-template scoring Isolation Forest (Liu et al., 2008) Unsupervised, linear in N. Features: [log_count, error_rate, warn_rate, unique_services, token_count] per template per service.
Streaming detection Half-Space-Trees (Tan et al., IJCAI 2011) True online ensemble. Constant memory per service, sub-ms scoring per event. No retraining cycle.
2-D layout TF-IDF + PCA → MinMax Templates project to (x, y) ∈ [5, 95] for drop-in scatter plots.

Measured performance

Single container, 4 vCPU, 8 GB RAM, against a real production-shaped workload:

  • 2,002,271 raw logs / 9 services → 58 Drain3 templates → 9 anomalies in ~90s wall-clock. The 90s is bottlenecked on HTTP fetch from ClickHouse, not on the ML.
  • Drain3: ~25,000 logs/sec/core.
  • Isolation Forest: <50 ms per service for ≤500 templates.
  • Half-Space-Trees: sub-millisecond per event.
  • Memory footprint: bounded by template count, not log count.

Anomaly signals

Every flagged row carries a reasons array, so downstream alerting can route deterministically:

reason trigger
anomaly_score HST score ≥ HST_THRESHOLD (default 0.7) for that service
new_template Drain has never seen this template for that service, level ≥ warn
error_burst ≥ 60% errors in the last 60s for that service

Connectors

Six platforms supported in this release, all returning the same row shape {timestamp, message, level, service} so the downstream pipeline is identical regardless of source:

  • New Relic (NerdGraph / NRQL)
  • Grafana Loki (LogQL, bearer auth)
  • Datadog (API + App key)
  • AWS CloudWatch Logs (IAM, log group + optional stream)
  • Sentry (user auth token, org + project)
  • ClickHouse (HTTP basic auth, configurable column mapping)

OpenTelemetry isn't a connector — route OTLP into ClickHouse or Loki via a standard collector config, then point Rocketgraph at that. Minimal collector config is in ml/README.md.

Roadmap: Splunk, Elastic / OpenSearch, Azure Monitor, GCP Cloud Logging.

HTTP API

Method Endpoint Purpose
GET /clusters Cluster a window of logs. Returns templates with anomaly scores and 2-D coords.
POST /clusters/train Same as /clusters, plus warms the per-service HST model on the same window.
POST /anomalies/detect Score new logs (inline JSON or fetched via a connector) against the trained HST. Returns only the anomalous rows.
POST /credentials Dynamic credentials — override .env at runtime, per-tenant, no restart.
GET /credentials Inspect which sources are configured.
POST /detector/reset Wipe the trained HST.
GET /health Liveness.

Time-window flags: 1h, 6h, 12h, 24h, 1d, 7d, or absolute start=<ISO>&end=<ISO> (ClickHouse).

Tunables

Env var Default Purpose
DRAIN_SIM_TH 0.4 Drain3 similarity threshold (lower → fewer, broader templates).
ANOMALY_CONTAMINATION 0.1 Isolation Forest expected anomaly fraction.
HST_THRESHOLD 0.7 Half-Space-Trees anomaly cutoff.
DEFAULT_LOOKBACK_HOURS 6 Used when window is omitted.
MAX_ROWS 100000 Per-fetch cap — bump for high-volume training windows.

Deployment

  • Single stateless container. docker compose up is the whole install. No DB, no auth provider, no agents on hosts.
  • VPC-only. No outbound traffic except to the connectors you configure. The container does not phone home.
  • Secrets. .env for static creds, POST /credentials for dynamic/per-tenant creds. Held in memory only — never written to disk or logs.
  • Optional signing middleware. SIGNING_SECRET gates every endpoint behind an X-Signing-Secret header.
  • Air-gap-ready. Designed for FedRAMP / HIPAA / SOC2-style environments.

@rgraph/otel-node — AI agent for OpenTelemetry instrumentation

Most teams want anomaly detection but lack the upstream pipeline that produces structured telemetry in the first place. @rgraph/otel-node (packages/otel-node) closes that gap.

export ROCKETGRAPH_API_KEY=rg_live_xxxxxxxxxxxx
cd ~/your-node-service
npx @rgraph/otel-node init

What it does on that single command:

  1. Reads package.json and the lockfile to detect framework, language (TS/JS), and package manager (npm / yarn / pnpm / bun).
  2. Scans the dependency tree to find every HTTP, DB, queue, and cache client in use.
  3. Writes (or merges into, with a .bak) an instrumentation.ts / .js tailored to that exact stack.
  4. Installs the required @opentelemetry/* packages using the detected package manager.
  5. Prints the exact --require / --import flag to wire the file in. For Next.js, picks up experimental.instrumentationHook automatically.

Detected stacks

Category Detected
Frameworks Express, Fastify, NestJS, Koa, Hapi, Restify, Next.js, Nuxt
HTTP / RPC http, https, grpc, @grpc/grpc-js
Databases pg, mysql2, mongodb, mongoose, redis, ioredis, prisma
Queues / cloud amqplib, kafkajs, aws-sdk, @aws-sdk/*
Package managers npm, yarn, pnpm, bun (auto from lockfile)
Languages TypeScript, JavaScript

Modes

  • Agent mode (default) — Claude-powered, reads your code, writes the right instrumentation file. No templates, no manual @opentelemetry/instrumentation-* package selection.
  • Legacy template mode (--legacy) — deterministic, no LLM. Use this in CI where reproducibility matters.
  • --dry-run --legacy — print the file that would be written + the packages that would be installed. No changes.
  • otel-node instrument — agent goes further: adds structured error handlers, span attributes, and observability code throughout the app.
  • otel-node detect — JSON report of what the detector sees. Useful for debugging or auditing in CI.
  • otel-node uninstall — removes the generated file and .bak. Leaves OTel packages installed.

Where it fits

Your Node service                  OTel Collector              Sink              Rocketgraph ML
─────────────────                  ─────────────              ────              ──────────────
@rgraph/otel-node init       ──>   OTLP HTTP / gRPC    ──>   ClickHouse   ──>   /clusters?source=clickhouse
(writes instrumentation)            (or any platform)        Loki                /anomalies/detect
                                                             Datadog
                                                             New Relic

The agent only owns the leftmost step — getting telemetry out of your service over standards-compliant OTLP. The right half is whatever observability platform you already pay for. No proprietary protocol, no custom SDK, no lock-in.


🎨 Rebrand to Rocketgraph

Project consolidated under the Rocketgraph name (formerly rgraph / RocketsGraphQL). New logo, new domain at rocketgraph.app, new docs structure. Engine and APIs unchanged — only branding and packaging.

The README now ships a real demo GIF (images/logs-snapshot.gif) showing the 2M-logs → 58-templates pipeline in motion, so you can see what the engine actually does without cloning anything.


📜 License change — BSL 1.1, converts to AGPL-3.0 on 2030-05-27

Rocketgraph is moving from AGPL-3.0 to Business Source License 1.1 with a Change Date of 2030-05-27, at which point the license converts to AGPL-3.0 automatically.

What you can do under BSL today (free, no cost):

  • Use Rocketgraph in production, internally, at any scale.
  • Modify it, fork it, embed it in your stack.
  • Self-host it inside your own infrastructure.

What you can't do under BSL today (until the Change Date):

  • Offer Rocketgraph as a competing hosted / managed service to third parties.

On 2030-05-27, the BSL terms expire and every version up to that point becomes AGPL-3.0. No action required from you. Same protection model the AGPL gave you, with a 4-year window of source-available commercial protection in front of it.

The license shift is the standard "MariaDB / Sentry / CockroachDB" pattern — designed to keep the project self-hostable and auditable while leaving room to fund continued development.

Full text: LICENSE.txt.


🔗 Community

Read more