A learning project that walks through the full pipeline for deploying large foundation vision models on NVIDIA Jetson devices using ONNX and TensorRT.
PyTorch Model → ONNX Export → TensorRT Engine → Optimized Inference
(models.py) (export_onnx.py) (build_engine.py) (inference.py)
- Load models from HuggingFace Transformers and timm (
models.py) - Export to ONNX — convert PyTorch models to a portable graph format (
export_onnx.py) - Build TensorRT engines — optimize ONNX graphs for the target GPU (
build_engine.py) - Run inference — compare PyTorch, ONNX Runtime, and TensorRT backends (
inference.py) - Benchmark — measure latency, throughput, and memory (
benchmark.py)
| Model | Source | Type | Parameters |
|---|---|---|---|
| ViT-Base (patch16) | Transformers | Classification | ~86M |
| DINOv2-Base | Transformers | Feature Extraction | ~86M |
| Swin-Base | Transformers | Classification | ~88M |
| EfficientNet-B0 | timm | Classification | ~5M |
| ConvNeXt-Base | timm | Classification | ~89M |
| ViT-Large (patch16) | timm | Classification | ~304M |
uv sync# Install Jetson-specific packages (from NVIDIA's repos)
pip install onnx onnxsim onnxruntime-gpu tensorrt pycuda
# Install this project
uv syncRun the examples in order:
# 1. Explore available models and their properties
python examples/01_explore_models.py
# 2. Export all models to ONNX format
python examples/02_export_onnx.py
# 3. Build TensorRT engines (Jetson only)
python examples/03_build_engines.py
# 4. Compare inference outputs across backends
python examples/04_run_inference.py [optional_image.jpg]
# 5. Run full benchmark suite
python examples/05_benchmark.py --num-runs 100The cpp/ directory contains a native C++ implementation using TensorRT's C API directly. This is the typical approach for production Jetson deployments where you need maximum performance and minimal dependencies.
cd cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
# Run inference
./trt_infer ../output/engines/vit-base-patch16_fp16.engine cat.jpg
# Run benchmark
./trt_benchmark ../output/engines/vit-base-patch16_fp16.engine 500The rust/ directory uses ONNX Runtime's Rust bindings (ort crate) with TensorRT as an execution provider. This gives you TensorRT's optimizations through a safe Rust API.
cd rust
cargo build --release
# Run inference
cargo run --release --bin infer -- --model ../output/onnx/vit-base-patch16.onnx --image cat.jpg
# Run benchmark
cargo run --release --bin benchmark -- --model ../output/onnx/vit-base-patch16.onnx --num-runs 200src/deployment/ — Python: full pipeline (export, build, infer, benchmark)
models.py Model catalog, loading, and preprocessing
export_onnx.py ONNX export, validation, and simplification
build_engine.py TensorRT engine building and INT8 calibration
inference.py Unified inference across PyTorch/ORT/TRT backends
benchmark.py Latency, throughput, and memory benchmarking
examples/ — Python example scripts (run in order)
01_explore_models.py Inspect model architectures and outputs
02_export_onnx.py Export models to ONNX
03_build_engines.py Build TensorRT engines
04_run_inference.py Run and compare inference
05_benchmark.py Full benchmark suite
cpp/ — C++: native TensorRT inference
CMakeLists.txt Build system (requires CUDA, TensorRT, OpenCV)
include/engine.hpp TensorRT engine wrapper (RAII)
include/preprocessing.hpp Image preprocessing (OpenCV)
src/engine.cpp Engine loading, buffer management, inference
src/preprocessing.cpp BGR→RGB, resize, normalize, HWC→CHW
src/main.cpp CLI inference tool
src/benchmark_main.cpp Latency benchmark tool
rust/ — Rust: ORT + TensorRT inference
Cargo.toml Dependencies (ort, ndarray, image, clap)
src/lib.rs Library root
src/engine.rs ORT session with TensorRT execution provider
src/preprocessing.rs Image preprocessing (pure Rust)
src/main.rs CLI inference tool
src/benchmark.rs Latency benchmark tool