Skip to content

Uni-Stuttgart-ESE/nexon

Β 
Β 

Repository files navigation

NEXON

NEXON is an AI model deployment platform for ONNX models. It serves feature-parity inference over REST and gRPC behind an Envoy gateway. Models are stored in MongoDB GridFS, and both services share a single inference orchestrator with an in-process LRU/TTL session cache. All services are containerized and health-checked for reliable bring-up, benchmarking, and grading.

πŸš€ Features

  • Upload, deploy, list, and delete ONNX models.
  • Deploy ONNX models directly from a MLflow Tracking Server
  • Inference via REST and gRPC with identical request/response semantics.
  • Inference endpoints
    • REST: POST /inference/infer/{model_name}
    • gRPC: InferenceService/Predict (Full proto)

gRPC FQMN: nexon.grpc.inference.v1.InferenceService/Predict

  • Envoy front door (single :8080 for REST + gRPC), admin UI on :9901.
  • Health: REST /healthz (liveness), /readyz (readiness) and gRPC Health service.
  • Frontend: the React UI invokes REST model management endpoints via Envoy on :8080.
  • Shared components:
  • Reproducible stubs: proto files are compiled into Python gRPC stubs at build time, packaged as a wheel, and installed.
  • Modern, modular Python layout suitable for benchmarking and coursework.
  • Docker containerization with health checks and multi-stage builds for gRPC stubs.

πŸ”§ Prerequisites

This project requires a running Docker environment.
Please follow the official guide for your operating system below:

macOS

For macOS, install Docker Desktop.

Linux

For Linux, install Docker Desktop or the Docker Engine suitable for your distribution.

Windows

A complete setup on Windows requires installing WSL, then Docker Desktop with the WSL 2 backend enabled.

  1. Install WSL (Ubuntu): Official Guide: Install WSL
  2. Install Docker Desktop: Official Guide: Docker Desktop for Windows
  3. Enable WSL 2 Backend: Official Guide: Enable WSL 2 Backend

πŸ“¦ Installation (Docker – recommended)

1. Clone the Repository

git clone https://github.com/Uni-Stuttgart-ESE/nexon.git
cd nexon

2. Prepare Environment

Create .env at the repo root (copy from .env.example):

# PowerShell / bash / zsh (recommended)
docker run --rm -v "${PWD}:/w" alpine:3 sh -lc 'cp /w/.env.example /w/.env'

For testing no value changes are needed, but it is adviced to change the passwords.

Important keys:

  • NEXON_MONGO_DB: database name (default: onnx_platform).
  • LOG_HEALTH: 1 logs health probes; 0 suppresses noisy health access logs.
  • ENABLE_REFLECTION: 1 to enable gRPC reflection (dev convenience).
  • GRPC_BIND, GRPC_MAX_RECV_BYTES, GRPC_MAX_SEND_BYTES: advanced gRPC tuning.

3. Start MLflow services

To build and start the MLflow Tracking Server, S3, MySQL and initial MLflow experiments use:

docker compose -f mlflow-compose.yml up --build -d

4. Start NEXON

To build and start the NEXON frontend, backend and MongoDB use:

docker compose -f nexon-compose.yml up --build -d

5. Use integration

  • Check MLflow to make sure the initial models are registered
  • Run example requests from the examples/ directory.
curl -X POST http://localhost:8000/api/mlflow/sync -H "Content-Type: application/json" -d @examples/test_step_1.json

Optional, for better readable responses:

curl -X POST http://localhost:8000/api/mlflow/sync -H "Content-Type: application/json" -d @examples/test_step_1.json | python -m json.tool
  • Check NEXON to see your deployed models

6. What's Running

  • Envoy (gateway): http://localhost:8080
  • REST API docs (via Envoy): http://localhost:8080/docs
  • REST service (direct): http://localhost:8000 (HTTP/1.1)
  • gRPC service (direct): localhost:50051 (HTTP/2)
  • MongoDB: localhost:27017
  • Envoy admin: http://localhost:9901

Status & logs:

docker compose ps
docker compose logs -f rest
docker compose logs -f grpc
docker compose logs -f envoy

Note: gRPC stubs are generated during the Docker build into /app/server/stubs/, packaged as a wheel, and installed into the image. They are not committed to git.


🧱 Local Development (optional)

Platform note

  • macOS/Linux: run make dev-bootstrap.
  • Windows: use WSL2 (Ubuntu) and run the same make commands.

1) One-time dev setup

# from repo root
make dev-bootstrap
# - creates .env if missing (defaults)
# - creates .venv, installs runtime + dev deps
# - generates protobuf/gRPC stubs, builds & installs the wheel
# - installs the app in editable mode and runs sanity checks

2) Start services locally (separate terminals)

MongoDB

make run-mongo-native

REST (FastAPI)

make run-rest

gRPC

make run-grpc

Envoy (local)

# Uses localhost backends (8000/50051)
make run-envoy-dev

Frontend (REST-only)

cd frontend
npm install
npm start
# The UI calls REST model management endpoints (via Envoy on :8080).

πŸ› οΈ Developer note (IDE imports)

The gRPC stubs (inference_pb2*) are generated inside the images. Your local IDE may still show unresolved imports if it isn’t using the container’s interpreter.

  • Quick fix: after make dev-bootstrap, point your IDE at .venv/bin/python (On native Windows: .venv\Scripts\python.exe)
IDE setup tips (optional)
  • PyCharm/IntelliJ: Settings β†’ Project: Python Interpreter β†’ Add β†’ Existing β†’ select .venv/bin/python
  • VS Code: Command Palette β†’ β€œPython: Select Interpreter” β†’ choose .venv

No change is needed to run via Dockerβ€”this is just for editor IntelliSense.

🧩 Architecture at a Glance

nexon/
β”œβ”€ ops/envoy/
β”‚  β”œβ”€ envoy.compose.yaml     # Docker routing (service names: rest, grpc)
β”‚  β”œβ”€ envoy.dev.yaml         # Local routing (localhost:8000 / :50051)
β”‚  └─ logs/                  # access logs
β”œβ”€ server/
β”‚  β”œβ”€ rest/                  # FastAPI REST service; exposes /inference, /upload, /deployment
β”‚  β”œβ”€ grpc_service/          # Async gRPC service; protos in ./protos; stubs packaged as a wheel at build time
β”‚  β”œβ”€ shared/
β”‚  β”‚  β”œβ”€ database.py         # MongoDB (Motor) + GridFS clients
β”‚  β”‚  β”œβ”€ orchestrator.py     # shared inference orchestration
β”‚  β”‚  └─ model_cache.py      # ONNXRuntime session cache (LRU/TTL)
β”‚  └─ tools/                 # CLI test clients & micro-benchmarks
└─ docker-compose.yml        # mongo + rest + grpc + envoy

πŸ§ͺ Testing & Reproducibility

This project includes two primary guides for validation:

  • NEXON: Test Client
    This guide provides a simple CLI client for smoke testing and micro-benchmarking. Use it for quick validation and running quick performance checks.
  • NEXON: Local Testing & Evaluation Guide
    This is the primary guide for formal evaluation. It contains the locally reproducible test suite with scripts for generating key evidence artifacts referenced in the thesis.

Acknowledgments

This work extends the original NEXON project by Hussein Megahed (UI and initial REST workflow).

Key contributions in this research extension:

  • gRPC Inference Service β€” low-latency, high-throughput inference (establishes a foundation for multiple communication protocols)
  • Envoy gateway β€” unified ingress on :8080
  • Shared components (used by both REST & gRPC):
    • Centralized database module
    • Inference orchestrator
    • In-process model cache for ONNX Runtime sessions
  • REST workflow hardening β€” added health/readiness, OpenAPI/Swagger documentation, modular sub-apps
  • Docker containerization and a reproducible protobuf/gRPC stubs pipeline

About

A platform for deploying ONNX models via REST APIs with metadata management

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 49.3%
  • Shell 25.4%
  • JavaScript 19.3%
  • Makefile 2.4%
  • Dockerfile 2.2%
  • CSS 0.7%
  • HTML 0.7%