Skip to content

mafischer/oRKLLM

Repository files navigation

oRKLLM

CI Release GitHub release Node.js License Platform Tests Vulnerabilities

              )       (
             ( \     / )          β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—
              \_\   /_/          β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘
            .-----------.        β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘
           /  [*]   [*]  \       β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘
          |    \  Ο‰  /    |      β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘
           \  .-------.  /        β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•     β•šβ•β•
          _/\/  #####  \/\_
         /  /   #####   \  \      Pronounced "ORC-EL-EL-EM"
        / ,/    #####    \, \     OpenAI-compatible LLM inference for Rockchip NPU.
       | / |  .-------.  | \ |    No cloud. No nonsense. Just efficient NPU inference.
       |/  '--[=======]--'  \|
       |       |     |       |
        \   ,  |     |  ,   /
         \  \. |     | ./  /
          '--' |     | '--'
               |     |
              / \   / \
             '   '-'   '

oRKLLM is an energy-efficient, OpenAI API-compatible local LLM inference server and premium admin console designed specifically for Rockchip NPU-powered platforms (such as the RK3576 found in the NanoPi M5 and RK3588 series SBCs).

Inspired by jundot/oMLX (which does the same for Apple Silicon), oRKLLM is adaptively re-engineered to run on the Rockchip RKLLM runtime (librkllmrt.so) with its unique hardware and concurrency constraints.


πŸš€ Key Features

  • OpenAI API Compatibility: Drop-in /v1/chat/completions, /v1/models, and /v1/embeddings endpoints β€” works with Open WebUI, Claude Code, and any OpenAI-compatible client.
  • Full Admin Console: Built with Vue 3 and Vuetify 3 β€” six dedicated pages:
    • Dashboard β€” live CPU/NPU/GPU/RAM/Disk/Temperature gauges, serving stats, prefix cache observability, RKLLM runtime versions
    • Models β€” local model manager, HuggingFace search, collection browser, direct downloader
    • Settings β€” inference defaults, HF token, prefix cache config, trusted proxy
    • Logs β€” full-page real-time log terminal over WebSocket
    • Bench β€” inference benchmark (TTFT, prefill tok/s, generation tok/s)
    • Chat β€” full streaming chat UI with conversation history sidebar (grouped by model), message queueing during inference, system prompt, model selector, and parameter controls
  • Conversation History: Chat sessions persisted in SQLite grouped by model. Collapsible sidebar on desktop, bottom-sheet on mobile. Partial responses saved via sendBeacon on page navigation.
  • Pin Model: Pin the active model to prevent idle auto-unload. Pin state persists across server restarts and triggers automatic model load on startup when sufficient RAM is available.
  • Multi-User Auth & RBAC: Local accounts or federated SSO via OIDC/SAML (Keycloak, Google, Azure AD). Two roles: admin and user. Site Management UI for user CRUD, auth provider config, and audit log.
  • OIDC / SAML SSO: Standard Flow with PKCE for public clients (no secret required). Group-to-role mapping from IdP claims. Routes at /auth/oidc/* and /auth/saml/*.
  • HuggingFace Integration: Search the HF Hub, browse collections, download .rkllm models directly. Search results show parameter count and storage size. A Compatible chipset filter auto-detects your SoC (RK3576/RK3588) from the board's device tree and appends it to the query β€” preventing downloads of models built for the wrong platform. The Download button queues all repo files simultaneously with per-file progress bars, speeds, and byte counters grouped by repo. Files saved to models/{repoName}/.
  • Prefix KV Cache: Tiered SSD hot/cold LRU cache saves KV state between conversation turns. Sliding context window (configurable up to 32,768 tokens, default 8,192) prevents NPU OOM on long conversations.
  • Process-Isolated Execution: Inference engine runs in a dedicated child process. Model unload/swap terminates the process, guaranteeing full NPU driver memory cleanup.
  • Smart Resource Management: Single active model lock, auto-swap, configurable idle timeout, pin-to-keep-loaded.
  • Runtime Version Auto-Matching & Auto-Download: oRKLLM reads the embedded version from each librkllmrt.so (via strings), matches it against the version in the model filename, and retries all candidates until one succeeds β€” caching the winner per model. On first setup, opt in to automatically download all versioned runtimes from mafischer/rkllm-runtimes (Apache 2.0). Opted-out users are prompted with a disclaimer dialog in the UI; API callers receive HTTP 422 RUNTIME_MISSING with the required version. Toggle in Settings after setup.
  • APT Distribution Channels: Three channels β€” stable (main), beta, alpha β€” with separate dists/<channel>/ directories on gh-pages. Users pin to their preferred channel.
  • Trusted Proxy: Supports true, single IP/CIDR, or comma-separated list (SAN-style) passed directly to Fastify's trustProxy.
  • Database Migrations: PRAGMA user_version migration runner β€” schema changes (v1–v3) apply automatically on startup, safe across upgrades from any previous version.
  • Seamless Mock Fallback: On non-ARM64/non-Linux platforms, oRKLLM falls back to a JS mock engine β€” rapid UI development on macOS/Windows without a board.
  • Dynamic N-API Bindings: C++ addon uses dlopen/dlsym β€” no compile-time dependency on librkllmrt.so.
  • Secure Auth: PBKDF2-HMAC-SHA256 password hashing, signed session cookies (userId|username|role|expires|HMAC), backward-compatible with single-user installs.

πŸ› οΈ Architecture & Tech Stack

graph TD
    Client[HTTP Client / Open WebUI] -->|REST API| Fastify[Fastify Server]
    Fastify -->|Admin SPA| Admin[Vue 3 / Vuetify Admin]
    Fastify -->|OpenAI Routes| API[OpenAI API Router]

    API -->|Queue Request| Pool[Engine Pool & Resource Manager]
    Pool -->|Spawn / Message| Worker[Worker Process]
    Worker -->|N-API Addon| Addon[orkllm_napi.node]
    Addon -->|Dynamic dlopen| C_API[librkllmrt.so C API]
    C_API -->|NPU Driver| NPU[Rockchip NPU Hardware]

    Admin -->|WebSocket Telemetry| Monitor[Telemetry Monitor]
    Monitor -->|/sys/kernel/debug/rknpu| Linux[Linux Kernel]
Loading
Layer Technology
API Server Node.js + Fastify (ES Modules)
Native Bindings C++ N-API addon (node-addon-api) with dlopen/dlsym
Mock Fallback Pure JS mock engine (auto-enabled on non-ARM64/non-Linux)
Frontend Vue 3 + Vuetify 3 SPA, built with Vite, route-based code splitting
Database SQLite via node:sqlite (Node β‰₯22.5) or better-sqlite3 (Node 20)
Auth Local PBKDF2 + OIDC (PKCE) + SAML 2.0
Testing Playwright E2E (64 tests across 3 spec files), mock OIDC service container in CI

πŸ“¦ Installing from a Release Package (Ubuntu / Armbian ARM64)

Pre-built .deb packages for ARM64 are available via the oRKLLM APT repository or directly from the GitHub Releases page.

Option A β€” APT repository (recommended)

Three channels are available:

Channel Branch Description
stable main Production releases β€” recommended for most users
beta beta Release candidates promoted from alpha after 48 h with no bug reports
alpha alpha Cutting-edge development builds
# Trust the oRKLLM signing key
curl -fsSL https://mafischer.github.io/oRKLLM/orkllm.gpg \
  | sudo gpg --dearmor -o /usr/share/keyrings/orkllm.gpg

# Add the repository β€” replace 'stable' with 'beta' or 'alpha' to follow pre-releases
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/orkllm.gpg] \
  https://mafischer.github.io/oRKLLM stable main" \
  | sudo tee /etc/apt/sources.list.d/orkllm.list

sudo apt update && sudo apt install orkllm

Option B β€” Direct download

VERSION=0.7.0
wget https://github.com/mafischer/oRKLLM/releases/latest/download/orkllm_${VERSION}_arm64.deb
sudo dpkg -i orkllm_${VERSION}_arm64.deb

Configure

sudo nano /etc/orkllm/orkllm.conf
ORKLLM_HOST=0.0.0.0
ORKLLM_PORT=8000
ORKLLM_LIB_PATH=/usr/lib/librkllmrt.so
ORKLLM_MODELS_DIR=/var/lib/orkllm/models
ORKLLM_DB_PATH=/var/lib/orkllm/orkllm.db

Add models and start

sudo cp your_model.rkllm /var/lib/orkllm/models/
sudo systemctl start orkllm

Admin console: http://<device-ip>:8000/admin

Service management

sudo systemctl start|stop|restart|status orkllm
journalctl -u orkllm -f

βš™οΈ Installation from Source

Prerequisites

  • Node.js β‰₯ 18 (β‰₯ 22.5 preferred for native node:sqlite)
  • node-gyp dependencies: Python 3, C++ compiler (Xcode CLT on macOS, build-essential on Linux)
  • A compiled .rkllm model (use rkllm-toolkit to convert from HuggingFace)
  • librkllmrt.so on the target board (typically at /usr/lib/librkllmrt.so)

Setup & Run

# Install all dependencies (compiles native addon)
npm install

# Build Vue frontend
npm run build:frontend

# Start development server (mock engine auto-enabled on macOS)
npm run dev:server
# β†’ http://localhost:8000/admin

Environment Variables

Variable Default Description
ORKLLM_HOST 127.0.0.1 Listen address (0.0.0.0 for LAN)
ORKLLM_PORT 8000 Listen port
ORKLLM_LIB_PATH /usr/lib/librkllmrt.so Path to Rockchip RKLLM runtime
ORKLLM_MODELS_DIR ./models Directory scanned for .rkllm files
ORKLLM_DB_PATH ~/.config/orkllm/auth.db SQLite database path
ORKLLM_TRUSTED_PROXY (unset) true (all), a single IP/CIDR, or comma-separated IPs/CIDRs to trust X-Forwarded-* headers
ORKLLM_RUNTIMES_DIR ~/.config/orkllm/runtimes Directory of versioned librkllmrt-aarch64-vX.Y.Z.so files for automatic runtime matching

πŸ§ͺ Running Tests

# Full E2E suite (mock mode, no board required)
npm test

# SSO integration tests using local Keycloak container (same as CI)
npm run test:sso        # starts Keycloak + runs SSO tests
npm run test:sso:down   # tear down Keycloak when done

CI runs the full suite including OIDC SSO via a containerised Keycloak instance with a pre-configured orkllm realm.

Test environment variables

Set these in .env locally (gitignored) or as GitHub Actions secrets/variables. The .env file is loaded automatically by Playwright.

Variable Where Description
ORKLLM_TEST_ADMIN_USER Secret Admin username created during test setup
ORKLLM_TEST_ADMIN_PASS Secret Admin password
ORKLLM_TEST_OIDC_ISSUER Secret Real Keycloak issuer URL (for ORKLLM_TEST_LIVE=1)
ORKLLM_TEST_OIDC_CLIENT_ID Secret OIDC client ID (orkllm-oidc)
ORKLLM_TEST_SAML_METADATA_URL Secret Real Keycloak SAML metadata URL
ORKLLM_TEST_OIDC_USER Secret Keycloak test user (testuser)
ORKLLM_TEST_OIDC_USER_PASS Secret Keycloak test user password
ORKLLM_TEST_OIDC_ADMIN_USER Secret Keycloak admin test user (testadminuser)
ORKLLM_TEST_OIDC_ADMIN_PASS Secret Keycloak admin test user password
ORKLLM_TEST_MOCK_OIDC_URL Auto-set Issuer URL of CI Keycloak container (http://localhost:8080/realms/orkllm)
ORKLLM_TEST_REDIRECT_BASE Auto-set Base URL for OIDC redirect_uri β€” derived from this so protocol is correct (http:// in CI, https:// live)
ORKLLM_TEST_LIVE Variable Set to 1 to run SSO tests against real Keycloak on LAN
ORKLLM_TEST_LIVE_URL Variable Live server URL (e.g. https://orkllm.fischerapps.com)

Debugging failed CI tests

When E2E tests fail in CI, Playwright uploads screenshots and error context as an artifact named playwright-report (retained 7 days).

Download via CLI:

gh run download <run-id> --name playwright-report -D /tmp/report
# Find the run ID with: gh run list --limit 5

Download via browser: GitHub Actions run β†’ Summary β†’ Artifacts section at the bottom β†’ download playwright-report.zip.

Each failed test has a test-failed-1.png screenshot and an error-context.md with the stack trace, making it easy to see exactly what the browser showed at the point of failure.


βš™οΈ RKLLM Runtime Auto-Downloader

oRKLLM requires a versioned copy of Rockchip's librkllmrt.so runtime library to drive NPU inference. Each .rkllm model file is compiled against a specific runtime version (e.g. 1.2.3), and loading a model with the wrong version fails immediately.

How it works

  1. oRKLLM parses the runtime version from the model filename (e.g. Qwen3-8B-rk3576-w4a16-**1.2.3**.rkllm).
  2. It searches ORKLLM_RUNTIMES_DIR (~/.config/orkllm/runtimes/ by default) for a matching librkllmrt-aarch64-v1.2.3.so.
  3. If none matches, it retries with all other available runtimes newest-first, then falls back to the system /usr/lib/librkllmrt.so.
  4. The winning library is cached per model so future loads skip straight to it.

Auto-download (opt-in)

During first-time setup you are prompted to opt in to auto-downloading runtimes. When enabled:

  • All available runtime versions are downloaded in the background at server startup.
  • When a model is loaded whose required runtime is not yet present, oRKLLM downloads it automatically before retrying the load.
  • The toggle can be changed at any time in Settings β†’ Runtime Auto-Download.

When opted out, the UI shows a disclaimer dialog before downloading, and API callers receive HTTP 422 RUNTIME_MISSING with the required version.

Runtime mirror

Pre-built librkllmrt.so binaries for aarch64 and armhf are published at:

github.com/mafischer/rkllm-runtimes

The mirror syncs from airockchip/rknn-llm nightly. All versions from v1.0.1 onward are available.

Direct download

VERSION=v1.2.3
ARCH=aarch64   # or armhf

curl -fsSL \
  https://github.com/mafischer/rkllm-runtimes/releases/download/${VERSION}/librkllmrt-${ARCH}-${VERSION}.so \
  -o ~/.config/orkllm/runtimes/librkllmrt-${ARCH}-${VERSION}.so

Licensing

librkllmrt.so is Rockchip proprietary software distributed by Airockchip under the Apache 2.0 License as part of the rknn-llm repository. The Apache 2.0 license explicitly permits redistribution with attribution. The mirror at mafischer/rkllm-runtimes reproduces this license in full on every release.

oRKLLM does not modify the binaries. They are downloaded verbatim from the upstream repository and re-published as properly versioned GitHub release artifacts for programmatic access.


πŸ“ Model Naming Convention

To help establish consistency across the fragmented Rockchip community, oRKLLM adopts a single unified naming convention for both the HuggingFace repository and the .rkllm file inside it.

Unified format

{Family}-{Params}-{Variant}-{Chipset}-{Quant}-{Algo}-v{Version}-RKLLM.rkllm

The HuggingFace repository name is the same string without the .rkllm extension.

Example: Qwen3-4B-Base-rk3576-w4a16-grq-v1.2.3-RKLLM
File inside repo: Qwen3-4B-Base-rk3576-w4a16-grq-v1.2.3-RKLLM.rkllm

Field Description Example
Family Base model name Qwen3, Llama3, Gemma2
Params Parameter count 4B, 8B, 0.5B, 35BA3B
Variant Model variant Base, Instruct, Chat
Chipset Target Rockchip SoC rk3576, rk3588
Quant Quantization type w4a16, w8a8
Algo Quantization algorithm grq, awq, gptq
Version rkllm-toolkit version (with v prefix) v1.2.3
RKLLM Required suffix for HuggingFace discoverability β€”

Note: oRKLLM parses the runtime version from the v{Version} field in the filename to auto-select the correct librkllmrt.so. Always include the version. Legacy files without the v prefix and -RKLLM suffix are also supported.

Recommended HuggingFace tags

Including these tags maximises discoverability and enables oRKLLM's compatible-chipset search filter to surface your model:

Category Tags
Core rkllm, rockchip, npu
Chipset rk3576, rk3588 (add whichever applies)
Model family qwen3, llama, gemma (lowercase)
Format rkllm, rknn

🌿 Contributing & Branch Flow

All development happens on the alpha branch. Promotions flow strictly forward β€” never commit directly to beta or main.

alpha  β†’  beta  β†’  main
Action Command
Promote to beta git push origin alpha:beta
Promote to main (stable release) git push origin beta:main

These are fast-forward pushes β€” no checkout, no merge commit. beta is a 48-hour soak channel; if no bugs are filed it can be promoted to main. Never use --no-ff for promotions as it creates merge commits that break future fast-forwards.


🀝 Credits & Acknowledgements

  • jundot/oMLX: Inspired the dashboard layout, metrics design, single-model lifecycle, and OpenAI compatibility structures.
  • Rockchip: SDKs and runtime libraries (librkllmrt.so) powering localized NPU inference.