oRKLLM

              )       (
             ( \     / )          ██████╗ ██████╗ ██╗  ██╗██╗     ██╗     ███╗   ███╗
              \_\   /_/          ██╔═══██╗██╔══██╗██║ ██╔╝██║     ██║     ████╗ ████║
            .-----------.        ██║   ██║██████╔╝█████╔╝ ██║     ██║     ██╔████╔██║
           /  [*]   [*]  \       ██║   ██║██╔══██╗██╔═██╗ ██║     ██║     ██║╚██╔╝██║
          |    \  ω  /    |      ╚██████╔╝██║  ██║██║  ██╗███████╗███████╗██║ ╚═╝ ██║
           \  .-------.  /        ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝╚══════╝╚═╝     ╚═╝
          _/\/  #####  \/\_
         /  /   #####   \  \      Pronounced "ORC-EL-EL-EM"
        / ,/    #####    \, \     OpenAI-compatible LLM inference for Rockchip NPU.
       | / |  .-------.  | \ |    No cloud. No nonsense. Just efficient NPU inference.
       |/  '--[=======]--'  \|
       |       |     |       |
        \   ,  |     |  ,   /
         \  \. |     | ./  /
          '--' |     | '--'
               |     |
              / \   / \
             '   '-'   '

oRKLLM is an energy-efficient, OpenAI API-compatible local LLM inference server and premium admin console designed specifically for Rockchip NPU-powered platforms (such as the RK3576 found in the NanoPi M5 and RK3588 series SBCs).

Inspired by jundot/oMLX (which does the same for Apple Silicon), oRKLLM is adaptively re-engineered to run on the Rockchip RKLLM runtime (librkllmrt.so) with its unique hardware and concurrency constraints.

🚀 Key Features

OpenAI API Compatibility: Drop-in /v1/chat/completions, /v1/models, and /v1/embeddings endpoints — works with Open WebUI, Claude Code, and any OpenAI-compatible client.
Full Admin Console: Built with Vue 3 and Vuetify 3 — six dedicated pages:
- Dashboard — live CPU/NPU/GPU/RAM/Disk/Temperature gauges, serving stats, prefix cache observability, RKLLM runtime versions
- Models — local model manager, HuggingFace search, collection browser, direct downloader
- Settings — inference defaults, HF token, prefix cache config, trusted proxy
- Logs — full-page real-time log terminal over WebSocket
- Bench — inference benchmark (TTFT, prefill tok/s, generation tok/s)
- Chat — full streaming chat UI with conversation history sidebar (grouped by model), message queueing during inference, system prompt, model selector, and parameter controls
Conversation History: Chat sessions persisted in SQLite grouped by model. Collapsible sidebar on desktop, bottom-sheet on mobile. Partial responses saved via sendBeacon on page navigation.
Pin Model: Pin the active model to prevent idle auto-unload. Pin state persists across server restarts and triggers automatic model load on startup when sufficient RAM is available.
Multi-User Auth & RBAC: Local accounts or federated SSO via OIDC/SAML (Keycloak, Google, Azure AD). Two roles: admin and user. Site Management UI for user CRUD, auth provider config, and audit log.
OIDC / SAML SSO: Standard Flow with PKCE for public clients (no secret required). Group-to-role mapping from IdP claims. Routes at /auth/oidc/* and /auth/saml/*.
HuggingFace Integration: Search the HF Hub, browse collections, download .rkllm models directly. Search results show parameter count and storage size. A Compatible chipset filter auto-detects your SoC (RK3576/RK3588) from the board's device tree and appends it to the query — preventing downloads of models built for the wrong platform. The Download button queues all repo files simultaneously with per-file progress bars, speeds, and byte counters grouped by repo. Files saved to models/{repoName}/.
Prefix KV Cache: Tiered SSD hot/cold LRU cache saves KV state between conversation turns. Sliding context window (configurable up to 32,768 tokens, default 8,192) prevents NPU OOM on long conversations.
Process-Isolated Execution: Inference engine runs in a dedicated child process. Model unload/swap terminates the process, guaranteeing full NPU driver memory cleanup.
Smart Resource Management: Single active model lock, auto-swap, configurable idle timeout, pin-to-keep-loaded.
Runtime Version Auto-Matching & Auto-Download: oRKLLM reads the embedded version from each librkllmrt.so (via strings), matches it against the version in the model filename, and retries all candidates until one succeeds — caching the winner per model. On first setup, opt in to automatically download all versioned runtimes from mafischer/rkllm-runtimes (Apache 2.0). Opted-out users are prompted with a disclaimer dialog in the UI; API callers receive HTTP 422 RUNTIME_MISSING with the required version. Toggle in Settings after setup.
APT Distribution Channels: Three channels — stable (main), beta, alpha — with separate dists/<channel>/ directories on gh-pages. Users pin to their preferred channel.
Trusted Proxy: Supports true, single IP/CIDR, or comma-separated list (SAN-style) passed directly to Fastify's trustProxy.
Database Migrations: PRAGMA user_version migration runner — schema changes (v1–v3) apply automatically on startup, safe across upgrades from any previous version.
Seamless Mock Fallback: On non-ARM64/non-Linux platforms, oRKLLM falls back to a JS mock engine — rapid UI development on macOS/Windows without a board.
Dynamic N-API Bindings: C++ addon uses dlopen/dlsym — no compile-time dependency on librkllmrt.so.
Secure Auth: PBKDF2-HMAC-SHA256 password hashing, signed session cookies (userId|username|role|expires|HMAC), backward-compatible with single-user installs.

🛠️ Architecture & Tech Stack

graph TD
    Client[HTTP Client / Open WebUI] -->|REST API| Fastify[Fastify Server]
    Fastify -->|Admin SPA| Admin[Vue 3 / Vuetify Admin]
    Fastify -->|OpenAI Routes| API[OpenAI API Router]

    API -->|Queue Request| Pool[Engine Pool & Resource Manager]
    Pool -->|Spawn / Message| Worker[Worker Process]
    Worker -->|N-API Addon| Addon[orkllm_napi.node]
    Addon -->|Dynamic dlopen| C_API[librkllmrt.so C API]
    C_API -->|NPU Driver| NPU[Rockchip NPU Hardware]

    Admin -->|WebSocket Telemetry| Monitor[Telemetry Monitor]
    Monitor -->|/sys/kernel/debug/rknpu| Linux[Linux Kernel]

Layer	Technology
API Server	Node.js + Fastify (ES Modules)
Native Bindings	C++ N-API addon (`node-addon-api`) with `dlopen`/`dlsym`
Mock Fallback	Pure JS mock engine (auto-enabled on non-ARM64/non-Linux)
Frontend	Vue 3 + Vuetify 3 SPA, built with Vite, route-based code splitting
Database	SQLite via `node:sqlite` (Node ≥22.5) or `better-sqlite3` (Node 20)
Auth	Local PBKDF2 + OIDC (PKCE) + SAML 2.0
Testing	Playwright E2E (64 tests across 3 spec files), mock OIDC service container in CI

📦 Installing from a Release Package (Ubuntu / Armbian ARM64)

Pre-built .deb packages for ARM64 are available via the oRKLLM APT repository or directly from the GitHub Releases page.

Option A — APT repository (recommended)

Three channels are available:

Channel	Branch	Description
`stable`	`main`	Production releases — recommended for most users
`beta`	`beta`	Release candidates promoted from alpha after 48 h with no bug reports
`alpha`	`alpha`	Cutting-edge development builds

# Trust the oRKLLM signing key
curl -fsSL https://mafischer.github.io/oRKLLM/orkllm.gpg \
  | sudo gpg --dearmor -o /usr/share/keyrings/orkllm.gpg

# Add the repository — replace 'stable' with 'beta' or 'alpha' to follow pre-releases
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/orkllm.gpg] \
  https://mafischer.github.io/oRKLLM stable main" \
  | sudo tee /etc/apt/sources.list.d/orkllm.list

sudo apt update && sudo apt install orkllm

Option B — Direct download

VERSION=0.7.0
wget https://github.com/mafischer/oRKLLM/releases/latest/download/orkllm_${VERSION}_arm64.deb
sudo dpkg -i orkllm_${VERSION}_arm64.deb

Configure

sudo nano /etc/orkllm/orkllm.conf

ORKLLM_HOST=0.0.0.0
ORKLLM_PORT=8000
ORKLLM_LIB_PATH=/usr/lib/librkllmrt.so
ORKLLM_MODELS_DIR=/var/lib/orkllm/models
ORKLLM_DB_PATH=/var/lib/orkllm/orkllm.db

Add models and start

sudo cp your_model.rkllm /var/lib/orkllm/models/
sudo systemctl start orkllm

Admin console: http://<device-ip>:8000/admin

Service management

sudo systemctl start|stop|restart|status orkllm
journalctl -u orkllm -f

⚙️ Installation from Source

Prerequisites

Node.js ≥ 18 (≥ 22.5 preferred for native node:sqlite)
node-gyp dependencies: Python 3, C++ compiler (Xcode CLT on macOS, build-essential on Linux)
A compiled .rkllm model (use rkllm-toolkit to convert from HuggingFace)
librkllmrt.so on the target board (typically at /usr/lib/librkllmrt.so)

Setup & Run

# Install all dependencies (compiles native addon)
npm install

# Build Vue frontend
npm run build:frontend

# Start development server (mock engine auto-enabled on macOS)
npm run dev:server
# → http://localhost:8000/admin

Environment Variables

Variable	Default	Description
`ORKLLM_HOST`	`127.0.0.1`	Listen address (`0.0.0.0` for LAN)
`ORKLLM_PORT`	`8000`	Listen port
`ORKLLM_LIB_PATH`	`/usr/lib/librkllmrt.so`	Path to Rockchip RKLLM runtime
`ORKLLM_MODELS_DIR`	`./models`	Directory scanned for `.rkllm` files
`ORKLLM_DB_PATH`	`~/.config/orkllm/auth.db`	SQLite database path
`ORKLLM_TRUSTED_PROXY`	(unset)	`true` (all), a single IP/CIDR, or comma-separated IPs/CIDRs to trust `X-Forwarded-*` headers
`ORKLLM_RUNTIMES_DIR`	`~/.config/orkllm/runtimes`	Directory of versioned `librkllmrt-aarch64-vX.Y.Z.so` files for automatic runtime matching

🧪 Running Tests

# Full E2E suite (mock mode, no board required)
npm test

# SSO integration tests using local Keycloak container (same as CI)
npm run test:sso        # starts Keycloak + runs SSO tests
npm run test:sso:down   # tear down Keycloak when done

CI runs the full suite including OIDC SSO via a containerised Keycloak instance with a pre-configured orkllm realm.

Test environment variables

Set these in .env locally (gitignored) or as GitHub Actions secrets/variables. The .env file is loaded automatically by Playwright.

Variable	Where	Description
`ORKLLM_TEST_ADMIN_USER`	Secret	Admin username created during test setup
`ORKLLM_TEST_ADMIN_PASS`	Secret	Admin password
`ORKLLM_TEST_OIDC_ISSUER`	Secret	Real Keycloak issuer URL (for `ORKLLM_TEST_LIVE=1`)
`ORKLLM_TEST_OIDC_CLIENT_ID`	Secret	OIDC client ID (`orkllm-oidc`)
`ORKLLM_TEST_SAML_METADATA_URL`	Secret	Real Keycloak SAML metadata URL
`ORKLLM_TEST_OIDC_USER`	Secret	Keycloak test user (`testuser`)
`ORKLLM_TEST_OIDC_USER_PASS`	Secret	Keycloak test user password
`ORKLLM_TEST_OIDC_ADMIN_USER`	Secret	Keycloak admin test user (`testadminuser`)
`ORKLLM_TEST_OIDC_ADMIN_PASS`	Secret	Keycloak admin test user password
`ORKLLM_TEST_MOCK_OIDC_URL`	Auto-set	Issuer URL of CI Keycloak container (`http://localhost:8080/realms/orkllm`)
`ORKLLM_TEST_REDIRECT_BASE`	Auto-set	Base URL for OIDC `redirect_uri` — derived from this so protocol is correct (`http://` in CI, `https://` live)
`ORKLLM_TEST_LIVE`	Variable	Set to `1` to run SSO tests against real Keycloak on LAN
`ORKLLM_TEST_LIVE_URL`	Variable	Live server URL (e.g. `https://orkllm.fischerapps.com`)

Debugging failed CI tests

When E2E tests fail in CI, Playwright uploads screenshots and error context as an artifact named playwright-report (retained 7 days).

Download via CLI:

gh run download <run-id> --name playwright-report -D /tmp/report
# Find the run ID with: gh run list --limit 5

Download via browser: GitHub Actions run → Summary → Artifacts section at the bottom → download playwright-report.zip.

Each failed test has a test-failed-1.png screenshot and an error-context.md with the stack trace, making it easy to see exactly what the browser showed at the point of failure.

⚙️ RKLLM Runtime Auto-Downloader

oRKLLM requires a versioned copy of Rockchip's librkllmrt.so runtime library to drive NPU inference. Each .rkllm model file is compiled against a specific runtime version (e.g. 1.2.3), and loading a model with the wrong version fails immediately.

How it works

oRKLLM parses the runtime version from the model filename (e.g. Qwen3-8B-rk3576-w4a16-**1.2.3**.rkllm).
It searches ORKLLM_RUNTIMES_DIR (~/.config/orkllm/runtimes/ by default) for a matching librkllmrt-aarch64-v1.2.3.so.
If none matches, it retries with all other available runtimes newest-first, then falls back to the system /usr/lib/librkllmrt.so.
The winning library is cached per model so future loads skip straight to it.

Auto-download (opt-in)

During first-time setup you are prompted to opt in to auto-downloading runtimes. When enabled:

All available runtime versions are downloaded in the background at server startup.
When a model is loaded whose required runtime is not yet present, oRKLLM downloads it automatically before retrying the load.
The toggle can be changed at any time in Settings → Runtime Auto-Download.

When opted out, the UI shows a disclaimer dialog before downloading, and API callers receive HTTP 422 RUNTIME_MISSING with the required version.

Runtime mirror

Pre-built librkllmrt.so binaries for aarch64 and armhf are published at:

github.com/mafischer/rkllm-runtimes

The mirror syncs from airockchip/rknn-llm nightly. All versions from v1.0.1 onward are available.

Direct download

VERSION=v1.2.3
ARCH=aarch64   # or armhf

curl -fsSL \
  https://github.com/mafischer/rkllm-runtimes/releases/download/${VERSION}/librkllmrt-${ARCH}-${VERSION}.so \
  -o ~/.config/orkllm/runtimes/librkllmrt-${ARCH}-${VERSION}.so

Licensing

librkllmrt.so is Rockchip proprietary software distributed by Airockchip under the Apache 2.0 License as part of the rknn-llm repository. The Apache 2.0 license explicitly permits redistribution with attribution. The mirror at mafischer/rkllm-runtimes reproduces this license in full on every release.

oRKLLM does not modify the binaries. They are downloaded verbatim from the upstream repository and re-published as properly versioned GitHub release artifacts for programmatic access.

📐 Model Naming Convention

To help establish consistency across the fragmented Rockchip community, oRKLLM adopts a single unified naming convention for both the HuggingFace repository and the .rkllm file inside it.

Unified format

{Family}-{Params}-{Variant}-{Chipset}-{Quant}-{Algo}-v{Version}-RKLLM.rkllm

The HuggingFace repository name is the same string without the .rkllm extension.

Example: Qwen3-4B-Base-rk3576-w4a16-grq-v1.2.3-RKLLM
File inside repo: Qwen3-4B-Base-rk3576-w4a16-grq-v1.2.3-RKLLM.rkllm

Field	Description	Example
`Family`	Base model name	`Qwen3`, `Llama3`, `Gemma2`
`Params`	Parameter count	`4B`, `8B`, `0.5B`, `35BA3B`
`Variant`	Model variant	`Base`, `Instruct`, `Chat`
`Chipset`	Target Rockchip SoC	`rk3576`, `rk3588`
`Quant`	Quantization type	`w4a16`, `w8a8`
`Algo`	Quantization algorithm	`grq`, `awq`, `gptq`
`Version`	rkllm-toolkit version (with `v` prefix)	`v1.2.3`
`RKLLM`	Required suffix for HuggingFace discoverability	—

Note: oRKLLM parses the runtime version from the v{Version} field in the filename to auto-select the correct librkllmrt.so. Always include the version. Legacy files without the v prefix and -RKLLM suffix are also supported.

Recommended HuggingFace tags

Including these tags maximises discoverability and enables oRKLLM's compatible-chipset search filter to surface your model:

Category	Tags
Core	`rkllm`, `rockchip`, `npu`
Chipset	`rk3576`, `rk3588` (add whichever applies)
Model family	`qwen3`, `llama`, `gemma` (lowercase)
Format	`rkllm`, `rknn`

🌿 Contributing & Branch Flow

All development happens on the alpha branch. Promotions flow strictly forward — never commit directly to beta or main.

alpha  →  beta  →  main

Action	Command
Promote to beta	`git push origin alpha:beta`
Promote to main (stable release)	`git push origin beta:main`

These are fast-forward pushes — no checkout, no merge commit. beta is a 48-hour soak channel; if no bugs are filed it can be promoted to main. Never use --no-ff for promotions as it creates merge commits that break future fast-forwards.

🤝 Credits & Acknowledgements

jundot/oMLX: Inspired the dashboard layout, metrics design, single-model lifecycle, and OpenAI compatibility structures.
Rockchip: SDKs and runtime libraries (librkllmrt.so) powering localized NPU inference.

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
.github		.github
debian		debian
e2e		e2e
frontend		frontend
scripts		scripts
src		src
.gitignore		.gitignore
.releaserc.json		.releaserc.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
binding.gyp		binding.gyp
docker-compose.test.yml		docker-compose.test.yml
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oRKLLM

🚀 Key Features

🛠️ Architecture & Tech Stack

📦 Installing from a Release Package (Ubuntu / Armbian ARM64)

Option A — APT repository (recommended)

Option B — Direct download

Configure

Add models and start

Service management

⚙️ Installation from Source

Prerequisites

Setup & Run

Environment Variables

🧪 Running Tests

Test environment variables

Debugging failed CI tests

⚙️ RKLLM Runtime Auto-Downloader

How it works

Auto-download (opt-in)

Runtime mirror

Direct download

Licensing

📐 Model Naming Convention

Unified format

Recommended HuggingFace tags

🌿 Contributing & Branch Flow

🤝 Credits & Acknowledgements

About

Uh oh!

Releases 59

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

oRKLLM

🚀 Key Features

🛠️ Architecture & Tech Stack

📦 Installing from a Release Package (Ubuntu / Armbian ARM64)

Option A — APT repository (recommended)

Option B — Direct download

Configure

Add models and start

Service management

⚙️ Installation from Source

Prerequisites

Setup & Run

Environment Variables

🧪 Running Tests

Test environment variables

Debugging failed CI tests

⚙️ RKLLM Runtime Auto-Downloader

How it works

Auto-download (opt-in)

Runtime mirror

Direct download

Licensing

📐 Model Naming Convention

Unified format

Recommended HuggingFace tags

🌿 Contributing & Branch Flow

🤝 Credits & Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 59

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages