Skip to content

loganfrank/deployment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Model Deployment on NVIDIA Jetson

A learning project that walks through the full pipeline for deploying large foundation vision models on NVIDIA Jetson devices using ONNX and TensorRT.

Pipeline Overview

PyTorch Model → ONNX Export → TensorRT Engine → Optimized Inference
  (models.py)   (export_onnx.py) (build_engine.py)  (inference.py)
  1. Load models from HuggingFace Transformers and timm (models.py)
  2. Export to ONNX — convert PyTorch models to a portable graph format (export_onnx.py)
  3. Build TensorRT engines — optimize ONNX graphs for the target GPU (build_engine.py)
  4. Run inference — compare PyTorch, ONNX Runtime, and TensorRT backends (inference.py)
  5. Benchmark — measure latency, throughput, and memory (benchmark.py)

Models Included

Model Source Type Parameters
ViT-Base (patch16) Transformers Classification ~86M
DINOv2-Base Transformers Feature Extraction ~86M
Swin-Base Transformers Classification ~88M
EfficientNet-B0 timm Classification ~5M
ConvNeXt-Base timm Classification ~89M
ViT-Large (patch16) timm Classification ~304M

Setup

On Mac (development only — no TensorRT/ONNX)

uv sync

On Jetson (full pipeline)

# Install Jetson-specific packages (from NVIDIA's repos)
pip install onnx onnxsim onnxruntime-gpu tensorrt pycuda

# Install this project
uv sync

Usage

Run the examples in order:

# 1. Explore available models and their properties
python examples/01_explore_models.py

# 2. Export all models to ONNX format
python examples/02_export_onnx.py

# 3. Build TensorRT engines (Jetson only)
python examples/03_build_engines.py

# 4. Compare inference outputs across backends
python examples/04_run_inference.py [optional_image.jpg]

# 5. Run full benchmark suite
python examples/05_benchmark.py --num-runs 100

C++ Deployment

The cpp/ directory contains a native C++ implementation using TensorRT's C API directly. This is the typical approach for production Jetson deployments where you need maximum performance and minimal dependencies.

cd cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# Run inference
./trt_infer ../output/engines/vit-base-patch16_fp16.engine cat.jpg

# Run benchmark
./trt_benchmark ../output/engines/vit-base-patch16_fp16.engine 500

Rust Deployment

The rust/ directory uses ONNX Runtime's Rust bindings (ort crate) with TensorRT as an execution provider. This gives you TensorRT's optimizations through a safe Rust API.

cd rust
cargo build --release

# Run inference
cargo run --release --bin infer -- --model ../output/onnx/vit-base-patch16.onnx --image cat.jpg

# Run benchmark
cargo run --release --bin benchmark -- --model ../output/onnx/vit-base-patch16.onnx --num-runs 200

Project Structure

src/deployment/              — Python: full pipeline (export, build, infer, benchmark)
    models.py                    Model catalog, loading, and preprocessing
    export_onnx.py               ONNX export, validation, and simplification
    build_engine.py              TensorRT engine building and INT8 calibration
    inference.py                 Unified inference across PyTorch/ORT/TRT backends
    benchmark.py                 Latency, throughput, and memory benchmarking

examples/                    — Python example scripts (run in order)
    01_explore_models.py         Inspect model architectures and outputs
    02_export_onnx.py            Export models to ONNX
    03_build_engines.py          Build TensorRT engines
    04_run_inference.py          Run and compare inference
    05_benchmark.py              Full benchmark suite

cpp/                         — C++: native TensorRT inference
    CMakeLists.txt               Build system (requires CUDA, TensorRT, OpenCV)
    include/engine.hpp           TensorRT engine wrapper (RAII)
    include/preprocessing.hpp    Image preprocessing (OpenCV)
    src/engine.cpp               Engine loading, buffer management, inference
    src/preprocessing.cpp        BGR→RGB, resize, normalize, HWC→CHW
    src/main.cpp                 CLI inference tool
    src/benchmark_main.cpp       Latency benchmark tool

rust/                        — Rust: ORT + TensorRT inference
    Cargo.toml                   Dependencies (ort, ndarray, image, clap)
    src/lib.rs                   Library root
    src/engine.rs                ORT session with TensorRT execution provider
    src/preprocessing.rs         Image preprocessing (pure Rust)
    src/main.rs                  CLI inference tool
    src/benchmark.rs             Latency benchmark tool

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors