) (
( \ / ) βββββββ βββββββ βββ ββββββ βββ ββββ ββββ
\_\ /_/ ββββββββββββββββββββ βββββββ βββ βββββ βββββ
.-----------. βββ ββββββββββββββββββ βββ βββ βββββββββββ
/ [*] [*] \ βββ ββββββββββββββββββ βββ βββ βββββββββββ
| \ Ο / | ββββββββββββ ββββββ ββββββββββββββββββββββ βββ βββ
\ .-------. / βββββββ βββ ββββββ ββββββββββββββββββββββ βββ
_/\/ ##### \/\_
/ / ##### \ \ Pronounced "ORC-EL-EL-EM"
/ ,/ ##### \, \ OpenAI-compatible LLM inference for Rockchip NPU.
| / | .-------. | \ | No cloud. No nonsense. Just efficient NPU inference.
|/ '--[=======]--' \|
| | | |
\ , | | , /
\ \. | | ./ /
'--' | | '--'
| |
/ \ / \
' '-' '
oRKLLM is an energy-efficient, OpenAI API-compatible local LLM inference server and premium admin console designed specifically for Rockchip NPU-powered platforms (such as the RK3576 found in the NanoPi M5 and RK3588 series SBCs).
Inspired by jundot/oMLX (which does the same for Apple Silicon), oRKLLM is adaptively re-engineered to run on the Rockchip RKLLM runtime (librkllmrt.so) with its unique hardware and concurrency constraints.
- OpenAI API Compatibility: Drop-in
/v1/chat/completions,/v1/models, and/v1/embeddingsendpoints β works with Open WebUI, Claude Code, and any OpenAI-compatible client. - Full Admin Console: Built with Vue 3 and Vuetify 3 β six dedicated pages:
- Dashboard β live CPU/NPU/GPU/RAM/Disk/Temperature gauges, serving stats, prefix cache observability, RKLLM runtime versions
- Models β local model manager, HuggingFace search, collection browser, direct downloader
- Settings β inference defaults, HF token, prefix cache config, trusted proxy
- Logs β full-page real-time log terminal over WebSocket
- Bench β inference benchmark (TTFT, prefill tok/s, generation tok/s)
- Chat β full streaming chat UI with conversation history sidebar (grouped by model), message queueing during inference, system prompt, model selector, and parameter controls
- Conversation History: Chat sessions persisted in SQLite grouped by model. Collapsible sidebar on desktop, bottom-sheet on mobile. Partial responses saved via
sendBeaconon page navigation. - Pin Model: Pin the active model to prevent idle auto-unload. Pin state persists across server restarts and triggers automatic model load on startup when sufficient RAM is available.
- Multi-User Auth & RBAC: Local accounts or federated SSO via OIDC/SAML (Keycloak, Google, Azure AD). Two roles:
adminanduser. Site Management UI for user CRUD, auth provider config, and audit log. - OIDC / SAML SSO: Standard Flow with PKCE for public clients (no secret required). Group-to-role mapping from IdP claims. Routes at
/auth/oidc/*and/auth/saml/*. - HuggingFace Integration: Search the HF Hub, browse collections, download
.rkllmmodels directly. Search results show parameter count and storage size. A Compatible chipset filter auto-detects your SoC (RK3576/RK3588) from the board's device tree and appends it to the query β preventing downloads of models built for the wrong platform. The Download button queues all repo files simultaneously with per-file progress bars, speeds, and byte counters grouped by repo. Files saved tomodels/{repoName}/. - Prefix KV Cache: Tiered SSD hot/cold LRU cache saves KV state between conversation turns. Sliding context window (configurable up to 32,768 tokens, default 8,192) prevents NPU OOM on long conversations.
- Process-Isolated Execution: Inference engine runs in a dedicated child process. Model unload/swap terminates the process, guaranteeing full NPU driver memory cleanup.
- Smart Resource Management: Single active model lock, auto-swap, configurable idle timeout, pin-to-keep-loaded.
- Runtime Version Auto-Matching & Auto-Download: oRKLLM reads the embedded version from each
librkllmrt.so(viastrings), matches it against the version in the model filename, and retries all candidates until one succeeds β caching the winner per model. On first setup, opt in to automatically download all versioned runtimes from mafischer/rkllm-runtimes (Apache 2.0). Opted-out users are prompted with a disclaimer dialog in the UI; API callers receive HTTP 422RUNTIME_MISSINGwith the required version. Toggle in Settings after setup. - APT Distribution Channels: Three channels β
stable(main),beta,alphaβ with separatedists/<channel>/directories on gh-pages. Users pin to their preferred channel. - Trusted Proxy: Supports
true, single IP/CIDR, or comma-separated list (SAN-style) passed directly to Fastify'strustProxy. - Database Migrations: PRAGMA user_version migration runner β schema changes (v1βv3) apply automatically on startup, safe across upgrades from any previous version.
- Seamless Mock Fallback: On non-ARM64/non-Linux platforms, oRKLLM falls back to a JS mock engine β rapid UI development on macOS/Windows without a board.
- Dynamic N-API Bindings: C++ addon uses
dlopen/dlsymβ no compile-time dependency onlibrkllmrt.so. - Secure Auth: PBKDF2-HMAC-SHA256 password hashing, signed session cookies (
userId|username|role|expires|HMAC), backward-compatible with single-user installs.
graph TD
Client[HTTP Client / Open WebUI] -->|REST API| Fastify[Fastify Server]
Fastify -->|Admin SPA| Admin[Vue 3 / Vuetify Admin]
Fastify -->|OpenAI Routes| API[OpenAI API Router]
API -->|Queue Request| Pool[Engine Pool & Resource Manager]
Pool -->|Spawn / Message| Worker[Worker Process]
Worker -->|N-API Addon| Addon[orkllm_napi.node]
Addon -->|Dynamic dlopen| C_API[librkllmrt.so C API]
C_API -->|NPU Driver| NPU[Rockchip NPU Hardware]
Admin -->|WebSocket Telemetry| Monitor[Telemetry Monitor]
Monitor -->|/sys/kernel/debug/rknpu| Linux[Linux Kernel]
| Layer | Technology |
|---|---|
| API Server | Node.js + Fastify (ES Modules) |
| Native Bindings | C++ N-API addon (node-addon-api) with dlopen/dlsym |
| Mock Fallback | Pure JS mock engine (auto-enabled on non-ARM64/non-Linux) |
| Frontend | Vue 3 + Vuetify 3 SPA, built with Vite, route-based code splitting |
| Database | SQLite via node:sqlite (Node β₯22.5) or better-sqlite3 (Node 20) |
| Auth | Local PBKDF2 + OIDC (PKCE) + SAML 2.0 |
| Testing | Playwright E2E (64 tests across 3 spec files), mock OIDC service container in CI |
Pre-built .deb packages for ARM64 are available via the oRKLLM APT repository or directly from the GitHub Releases page.
Three channels are available:
| Channel | Branch | Description |
|---|---|---|
stable |
main |
Production releases β recommended for most users |
beta |
beta |
Release candidates promoted from alpha after 48 h with no bug reports |
alpha |
alpha |
Cutting-edge development builds |
# Trust the oRKLLM signing key
curl -fsSL https://mafischer.github.io/oRKLLM/orkllm.gpg \
| sudo gpg --dearmor -o /usr/share/keyrings/orkllm.gpg
# Add the repository β replace 'stable' with 'beta' or 'alpha' to follow pre-releases
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/orkllm.gpg] \
https://mafischer.github.io/oRKLLM stable main" \
| sudo tee /etc/apt/sources.list.d/orkllm.list
sudo apt update && sudo apt install orkllmVERSION=0.7.0
wget https://github.com/mafischer/oRKLLM/releases/latest/download/orkllm_${VERSION}_arm64.deb
sudo dpkg -i orkllm_${VERSION}_arm64.debsudo nano /etc/orkllm/orkllm.confORKLLM_HOST=0.0.0.0
ORKLLM_PORT=8000
ORKLLM_LIB_PATH=/usr/lib/librkllmrt.so
ORKLLM_MODELS_DIR=/var/lib/orkllm/models
ORKLLM_DB_PATH=/var/lib/orkllm/orkllm.dbsudo cp your_model.rkllm /var/lib/orkllm/models/
sudo systemctl start orkllmAdmin console: http://<device-ip>:8000/admin
sudo systemctl start|stop|restart|status orkllm
journalctl -u orkllm -f- Node.js β₯ 18 (β₯ 22.5 preferred for native
node:sqlite) node-gypdependencies: Python 3, C++ compiler (Xcode CLT on macOS,build-essentialon Linux)- A compiled
.rkllmmodel (userkllm-toolkitto convert from HuggingFace) librkllmrt.soon the target board (typically at/usr/lib/librkllmrt.so)
# Install all dependencies (compiles native addon)
npm install
# Build Vue frontend
npm run build:frontend
# Start development server (mock engine auto-enabled on macOS)
npm run dev:server
# β http://localhost:8000/admin| Variable | Default | Description |
|---|---|---|
ORKLLM_HOST |
127.0.0.1 |
Listen address (0.0.0.0 for LAN) |
ORKLLM_PORT |
8000 |
Listen port |
ORKLLM_LIB_PATH |
/usr/lib/librkllmrt.so |
Path to Rockchip RKLLM runtime |
ORKLLM_MODELS_DIR |
./models |
Directory scanned for .rkllm files |
ORKLLM_DB_PATH |
~/.config/orkllm/auth.db |
SQLite database path |
ORKLLM_TRUSTED_PROXY |
(unset) | true (all), a single IP/CIDR, or comma-separated IPs/CIDRs to trust X-Forwarded-* headers |
ORKLLM_RUNTIMES_DIR |
~/.config/orkllm/runtimes |
Directory of versioned librkllmrt-aarch64-vX.Y.Z.so files for automatic runtime matching |
# Full E2E suite (mock mode, no board required)
npm test
# SSO integration tests using local Keycloak container (same as CI)
npm run test:sso # starts Keycloak + runs SSO tests
npm run test:sso:down # tear down Keycloak when doneCI runs the full suite including OIDC SSO via a containerised Keycloak instance with a pre-configured orkllm realm.
Set these in .env locally (gitignored) or as GitHub Actions secrets/variables. The .env file is loaded automatically by Playwright.
| Variable | Where | Description |
|---|---|---|
ORKLLM_TEST_ADMIN_USER |
Secret | Admin username created during test setup |
ORKLLM_TEST_ADMIN_PASS |
Secret | Admin password |
ORKLLM_TEST_OIDC_ISSUER |
Secret | Real Keycloak issuer URL (for ORKLLM_TEST_LIVE=1) |
ORKLLM_TEST_OIDC_CLIENT_ID |
Secret | OIDC client ID (orkllm-oidc) |
ORKLLM_TEST_SAML_METADATA_URL |
Secret | Real Keycloak SAML metadata URL |
ORKLLM_TEST_OIDC_USER |
Secret | Keycloak test user (testuser) |
ORKLLM_TEST_OIDC_USER_PASS |
Secret | Keycloak test user password |
ORKLLM_TEST_OIDC_ADMIN_USER |
Secret | Keycloak admin test user (testadminuser) |
ORKLLM_TEST_OIDC_ADMIN_PASS |
Secret | Keycloak admin test user password |
ORKLLM_TEST_MOCK_OIDC_URL |
Auto-set | Issuer URL of CI Keycloak container (http://localhost:8080/realms/orkllm) |
ORKLLM_TEST_REDIRECT_BASE |
Auto-set | Base URL for OIDC redirect_uri β derived from this so protocol is correct (http:// in CI, https:// live) |
ORKLLM_TEST_LIVE |
Variable | Set to 1 to run SSO tests against real Keycloak on LAN |
ORKLLM_TEST_LIVE_URL |
Variable | Live server URL (e.g. https://orkllm.fischerapps.com) |
When E2E tests fail in CI, Playwright uploads screenshots and error context as an artifact named playwright-report (retained 7 days).
Download via CLI:
gh run download <run-id> --name playwright-report -D /tmp/report
# Find the run ID with: gh run list --limit 5Download via browser: GitHub Actions run β Summary β Artifacts section at the bottom β download playwright-report.zip.
Each failed test has a test-failed-1.png screenshot and an error-context.md with the stack trace, making it easy to see exactly what the browser showed at the point of failure.
oRKLLM requires a versioned copy of Rockchip's librkllmrt.so runtime library to drive NPU inference. Each .rkllm model file is compiled against a specific runtime version (e.g. 1.2.3), and loading a model with the wrong version fails immediately.
- oRKLLM parses the runtime version from the model filename (e.g.
Qwen3-8B-rk3576-w4a16-**1.2.3**.rkllm). - It searches
ORKLLM_RUNTIMES_DIR(~/.config/orkllm/runtimes/by default) for a matchinglibrkllmrt-aarch64-v1.2.3.so. - If none matches, it retries with all other available runtimes newest-first, then falls back to the system
/usr/lib/librkllmrt.so. - The winning library is cached per model so future loads skip straight to it.
During first-time setup you are prompted to opt in to auto-downloading runtimes. When enabled:
- All available runtime versions are downloaded in the background at server startup.
- When a model is loaded whose required runtime is not yet present, oRKLLM downloads it automatically before retrying the load.
- The toggle can be changed at any time in Settings β Runtime Auto-Download.
When opted out, the UI shows a disclaimer dialog before downloading, and API callers receive HTTP 422 RUNTIME_MISSING with the required version.
Pre-built librkllmrt.so binaries for aarch64 and armhf are published at:
github.com/mafischer/rkllm-runtimes
The mirror syncs from airockchip/rknn-llm nightly. All versions from v1.0.1 onward are available.
VERSION=v1.2.3
ARCH=aarch64 # or armhf
curl -fsSL \
https://github.com/mafischer/rkllm-runtimes/releases/download/${VERSION}/librkllmrt-${ARCH}-${VERSION}.so \
-o ~/.config/orkllm/runtimes/librkllmrt-${ARCH}-${VERSION}.solibrkllmrt.so is Rockchip proprietary software distributed by Airockchip under the Apache 2.0 License as part of the rknn-llm repository. The Apache 2.0 license explicitly permits redistribution with attribution. The mirror at mafischer/rkllm-runtimes reproduces this license in full on every release.
oRKLLM does not modify the binaries. They are downloaded verbatim from the upstream repository and re-published as properly versioned GitHub release artifacts for programmatic access.
To help establish consistency across the fragmented Rockchip community, oRKLLM adopts a single unified naming convention for both the HuggingFace repository and the .rkllm file inside it.
{Family}-{Params}-{Variant}-{Chipset}-{Quant}-{Algo}-v{Version}-RKLLM.rkllm
The HuggingFace repository name is the same string without the .rkllm extension.
Example: Qwen3-4B-Base-rk3576-w4a16-grq-v1.2.3-RKLLM
File inside repo: Qwen3-4B-Base-rk3576-w4a16-grq-v1.2.3-RKLLM.rkllm
| Field | Description | Example |
|---|---|---|
Family |
Base model name | Qwen3, Llama3, Gemma2 |
Params |
Parameter count | 4B, 8B, 0.5B, 35BA3B |
Variant |
Model variant | Base, Instruct, Chat |
Chipset |
Target Rockchip SoC | rk3576, rk3588 |
Quant |
Quantization type | w4a16, w8a8 |
Algo |
Quantization algorithm | grq, awq, gptq |
Version |
rkllm-toolkit version (with v prefix) |
v1.2.3 |
RKLLM |
Required suffix for HuggingFace discoverability | β |
Note: oRKLLM parses the runtime version from the
v{Version}field in the filename to auto-select the correctlibrkllmrt.so. Always include the version. Legacy files without thevprefix and-RKLLMsuffix are also supported.
Including these tags maximises discoverability and enables oRKLLM's compatible-chipset search filter to surface your model:
| Category | Tags |
|---|---|
| Core | rkllm, rockchip, npu |
| Chipset | rk3576, rk3588 (add whichever applies) |
| Model family | qwen3, llama, gemma (lowercase) |
| Format | rkllm, rknn |
All development happens on the alpha branch. Promotions flow strictly forward β never commit directly to beta or main.
alpha β beta β main
| Action | Command |
|---|---|
| Promote to beta | git push origin alpha:beta |
| Promote to main (stable release) | git push origin beta:main |
These are fast-forward pushes β no checkout, no merge commit. beta is a 48-hour soak channel; if no bugs are filed it can be promoted to main. Never use --no-ff for promotions as it creates merge commits that break future fast-forwards.
- jundot/oMLX: Inspired the dashboard layout, metrics design, single-model lifecycle, and OpenAI compatibility structures.
- Rockchip: SDKs and runtime libraries (
librkllmrt.so) powering localized NPU inference.