NEXON is an AI model deployment platform for ONNX models. It serves feature-parity inference over REST and gRPC behind an Envoy gateway. Models are stored in MongoDB GridFS, and both services share a single inference orchestrator with an in-process LRU/TTL session cache. All services are containerized and health-checked for reliable bring-up, benchmarking, and grading.
- Upload, deploy, list, and delete ONNX models.
- Deploy ONNX models directly from a MLflow Tracking Server
- Inference via REST and gRPC with identical request/response semantics.
- Inference endpoints
- REST:
POST /inference/infer/{model_name} - gRPC:
InferenceService/Predict(Full proto)
- REST:
gRPC FQMN:
nexon.grpc.inference.v1.InferenceService/Predict
- Envoy front door (single :8080 for REST + gRPC), admin UI on :9901.
- Health: REST
/healthz(liveness),/readyz(readiness) and gRPC Health service. - Frontend: the React UI invokes REST model management endpoints via Envoy on :8080.
- Shared components:
- Reproducible stubs: proto files are compiled into Python gRPC stubs at build time, packaged as a wheel, and installed.
- Modern, modular Python layout suitable for benchmarking and coursework.
- Docker containerization with health checks and multi-stage builds for gRPC stubs.
This project requires a running Docker environment.
Please follow the official guide for your operating system below:
For macOS, install Docker Desktop.
- Official Guide: Docker Desktop for Mac
For Linux, install Docker Desktop or the Docker Engine suitable for your distribution.
- Official Guide: Docker Desktop for Linux
- Official Guide: Docker Engine (by distribution)
A complete setup on Windows requires installing WSL, then Docker Desktop with the WSL 2 backend enabled.
- Install WSL (Ubuntu): Official Guide: Install WSL
- Install Docker Desktop: Official Guide: Docker Desktop for Windows
- Enable WSL 2 Backend: Official Guide: Enable WSL 2 Backend
git clone https://github.com/Uni-Stuttgart-ESE/nexon.git
cd nexonCreate .env at the repo root (copy from .env.example):
# PowerShell / bash / zsh (recommended)
docker run --rm -v "${PWD}:/w" alpine:3 sh -lc 'cp /w/.env.example /w/.env'For testing no value changes are needed, but it is adviced to change the passwords.
Important keys:
NEXON_MONGO_DB: database name (default:onnx_platform).LOG_HEALTH:1logs health probes;0suppresses noisy health access logs.ENABLE_REFLECTION:1to enable gRPC reflection (dev convenience).GRPC_BIND,GRPC_MAX_RECV_BYTES,GRPC_MAX_SEND_BYTES: advanced gRPC tuning.
To build and start the MLflow Tracking Server, S3, MySQL and initial MLflow experiments use:
docker compose -f mlflow-compose.yml up --build -dTo build and start the NEXON frontend, backend and MongoDB use:
docker compose -f nexon-compose.yml up --build -d- Check MLflow to make sure the initial models are registered
- Run example requests from the
examples/directory.
curl -X POST http://localhost:8000/api/mlflow/sync -H "Content-Type: application/json" -d @examples/test_step_1.jsonOptional, for better readable responses:
curl -X POST http://localhost:8000/api/mlflow/sync -H "Content-Type: application/json" -d @examples/test_step_1.json | python -m json.tool- Check NEXON to see your deployed models
- Envoy (gateway):
http://localhost:8080 - REST API docs (via Envoy):
http://localhost:8080/docs - REST service (direct):
http://localhost:8000(HTTP/1.1) - gRPC service (direct):
localhost:50051(HTTP/2) - MongoDB:
localhost:27017 - Envoy admin:
http://localhost:9901
Status & logs:
docker compose ps
docker compose logs -f rest
docker compose logs -f grpc
docker compose logs -f envoyNote: gRPC stubs are generated during the Docker build into /app/server/stubs/, packaged as a wheel, and installed into the image. They are not committed to git.
Platform note
- macOS/Linux: run
make dev-bootstrap.- Windows: use WSL2 (Ubuntu) and run the same
makecommands.
# from repo root
make dev-bootstrap
# - creates .env if missing (defaults)
# - creates .venv, installs runtime + dev deps
# - generates protobuf/gRPC stubs, builds & installs the wheel
# - installs the app in editable mode and runs sanity checksMongoDB
make run-mongo-nativeREST (FastAPI)
make run-restgRPC
make run-grpcEnvoy (local)
# Uses localhost backends (8000/50051)
make run-envoy-devFrontend (REST-only)
cd frontend
npm install
npm start
# The UI calls REST model management endpoints (via Envoy on :8080).π οΈ Developer note (IDE imports)
The gRPC stubs (
inference_pb2*) are generated inside the images. Your local IDE may still show unresolved imports if it isnβt using the containerβs interpreter.
- Quick fix: after
make dev-bootstrap, point your IDE at.venv/bin/python(On native Windows:.venv\Scripts\python.exe)IDE setup tips (optional)
- PyCharm/IntelliJ:
SettingsβProject: Python InterpreterβAddβExistingβ select.venv/bin/python- VS Code: Command Palette β βPython: Select Interpreterβ β choose
.venvNo change is needed to run via Dockerβthis is just for editor IntelliSense.
nexon/
ββ ops/envoy/
β ββ envoy.compose.yaml # Docker routing (service names: rest, grpc)
β ββ envoy.dev.yaml # Local routing (localhost:8000 / :50051)
β ββ logs/ # access logs
ββ server/
β ββ rest/ # FastAPI REST service; exposes /inference, /upload, /deployment
β ββ grpc_service/ # Async gRPC service; protos in ./protos; stubs packaged as a wheel at build time
β ββ shared/
β β ββ database.py # MongoDB (Motor) + GridFS clients
β β ββ orchestrator.py # shared inference orchestration
β β ββ model_cache.py # ONNXRuntime session cache (LRU/TTL)
β ββ tools/ # CLI test clients & micro-benchmarks
ββ docker-compose.yml # mongo + rest + grpc + envoy
This project includes two primary guides for validation:
- NEXON: Test Client
This guide provides a simple CLI client for smoke testing and micro-benchmarking. Use it for quick validation and running quick performance checks. - NEXON: Local Testing & Evaluation Guide
This is the primary guide for formal evaluation. It contains the locally reproducible test suite with scripts for generating key evidence artifacts referenced in the thesis.
This work extends the original NEXON project by Hussein Megahed (UI and initial REST workflow).
Key contributions in this research extension:
- gRPC Inference Service β low-latency, high-throughput inference (establishes a foundation for multiple communication protocols)
- Envoy gateway β unified ingress on :8080
- Shared components (used by both REST & gRPC):
- Centralized database module
- Inference orchestrator
- In-process model cache for ONNX Runtime sessions
- REST workflow hardening β added health/readiness, OpenAPI/Swagger documentation, modular sub-apps
- Docker containerization and a reproducible protobuf/gRPC stubs pipeline