Vision Model Deployment on NVIDIA Jetson

A learning project that walks through the full pipeline for deploying large foundation vision models on NVIDIA Jetson devices using ONNX and TensorRT.

Pipeline Overview

PyTorch Model → ONNX Export → TensorRT Engine → Optimized Inference
  (models.py)   (export_onnx.py) (build_engine.py)  (inference.py)

Load models from HuggingFace Transformers and timm (models.py)
Export to ONNX — convert PyTorch models to a portable graph format (export_onnx.py)
Build TensorRT engines — optimize ONNX graphs for the target GPU (build_engine.py)
Run inference — compare PyTorch, ONNX Runtime, and TensorRT backends (inference.py)
Benchmark — measure latency, throughput, and memory (benchmark.py)

Models Included

Model	Source	Type	Parameters
ViT-Base (patch16)	Transformers	Classification	~86M
DINOv2-Base	Transformers	Feature Extraction	~86M
Swin-Base	Transformers	Classification	~88M
EfficientNet-B0	timm	Classification	~5M
ConvNeXt-Base	timm	Classification	~89M
ViT-Large (patch16)	timm	Classification	~304M

Setup

On Mac (development only — no TensorRT/ONNX)

uv sync

On Jetson (full pipeline)

# Install Jetson-specific packages (from NVIDIA's repos)
pip install onnx onnxsim onnxruntime-gpu tensorrt pycuda

# Install this project
uv sync

Usage

Run the examples in order:

# 1. Explore available models and their properties
python examples/01_explore_models.py

# 2. Export all models to ONNX format
python examples/02_export_onnx.py

# 3. Build TensorRT engines (Jetson only)
python examples/03_build_engines.py

# 4. Compare inference outputs across backends
python examples/04_run_inference.py [optional_image.jpg]

# 5. Run full benchmark suite
python examples/05_benchmark.py --num-runs 100

C++ Deployment

The cpp/ directory contains a native C++ implementation using TensorRT's C API directly. This is the typical approach for production Jetson deployments where you need maximum performance and minimal dependencies.

cd cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# Run inference
./trt_infer ../output/engines/vit-base-patch16_fp16.engine cat.jpg

# Run benchmark
./trt_benchmark ../output/engines/vit-base-patch16_fp16.engine 500

Rust Deployment

The rust/ directory uses ONNX Runtime's Rust bindings (ort crate) with TensorRT as an execution provider. This gives you TensorRT's optimizations through a safe Rust API.

cd rust
cargo build --release

# Run inference
cargo run --release --bin infer -- --model ../output/onnx/vit-base-patch16.onnx --image cat.jpg

# Run benchmark
cargo run --release --bin benchmark -- --model ../output/onnx/vit-base-patch16.onnx --num-runs 200

Project Structure

src/deployment/              — Python: full pipeline (export, build, infer, benchmark)
    models.py                    Model catalog, loading, and preprocessing
    export_onnx.py               ONNX export, validation, and simplification
    build_engine.py              TensorRT engine building and INT8 calibration
    inference.py                 Unified inference across PyTorch/ORT/TRT backends
    benchmark.py                 Latency, throughput, and memory benchmarking

examples/                    — Python example scripts (run in order)
    01_explore_models.py         Inspect model architectures and outputs
    02_export_onnx.py            Export models to ONNX
    03_build_engines.py          Build TensorRT engines
    04_run_inference.py          Run and compare inference
    05_benchmark.py              Full benchmark suite

cpp/                         — C++: native TensorRT inference
    CMakeLists.txt               Build system (requires CUDA, TensorRT, OpenCV)
    include/engine.hpp           TensorRT engine wrapper (RAII)
    include/preprocessing.hpp    Image preprocessing (OpenCV)
    src/engine.cpp               Engine loading, buffer management, inference
    src/preprocessing.cpp        BGR→RGB, resize, normalize, HWC→CHW
    src/main.cpp                 CLI inference tool
    src/benchmark_main.cpp       Latency benchmark tool

rust/                        — Rust: ORT + TensorRT inference
    Cargo.toml                   Dependencies (ort, ndarray, image, clap)
    src/lib.rs                   Library root
    src/engine.rs                ORT session with TensorRT execution provider
    src/preprocessing.rs         Image preprocessing (pure Rust)
    src/main.rs                  CLI inference tool
    src/benchmark.rs             Latency benchmark tool

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cpp		cpp
examples		examples
rust		rust
src/deployment		src/deployment
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Model Deployment on NVIDIA Jetson

Pipeline Overview

Models Included

Setup

On Mac (development only — no TensorRT/ONNX)

On Jetson (full pipeline)

Usage

C++ Deployment

Rust Deployment

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Model Deployment on NVIDIA Jetson

Pipeline Overview

Models Included

Setup

On Mac (development only — no TensorRT/ONNX)

On Jetson (full pipeline)

Usage

C++ Deployment

Rust Deployment

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages