Skip to content

feat: Add multi-stage multi-arch Containerfile#46

Open
FreddyFunk wants to merge 1 commit intoNVIDIA:mainfrom
FreddyFunk:feat/containerfile
Open

feat: Add multi-stage multi-arch Containerfile#46
FreddyFunk wants to merge 1 commit intoNVIDIA:mainfrom
FreddyFunk:feat/containerfile

Conversation

@FreddyFunk
Copy link

What does this PR do?

Adds a multi-stage, multi-arch Containerfile that produces two image targets: a python-export image for model quantization and ONNX export and a leaner runtime image with only the C++ binaries for engine building and inference. No GPU is required at build time, enabling CI and headless server builds. Also adds a documentation page covering build instructions, usage examples and available export tools.

Relates to #32. The file is named Containerfile rather than Dockerfile to reflect that it is an OCI-compliant container image, compatible with both Podman and Docker.

Type of change: ?
new feature, documentation

Overview: ?

Usage

# Build the lean C++ runtime image (default)
podman build -t tensorrt-edgellm .

# Build the Python export pipeline image
podman build --target python-export -t tensorrt-edgellm-export .

# Cross-build for aarch64 on an x86_64 host
podman build --platform linux/arm64 -t tensorrt-edgellm-arm64 .

# Run with GPU access (Podman CDI)
podman run --rm -it --device nvidia.com/gpu=all \
  -v ./workspace:/workspace/output tensorrt-edgellm

# Run with GPU access (Docker)
docker run --rm -it --gpus all \
  -v ./workspace:/workspace/output tensorrt-edgellm

Testing

All three image builds verified on x86_64 host with a RTX 3090 and a x84_64 host without a NVIDIA GPU :

  • podman build -t tensorrt-edgellm . --> builds successfully (10.4 GB)
  • podman build --target python-export -t tensorrt-edgellm-export . --> builds successfully (19 GB)
  • podman build --platform linux/arm64 -t tensorrt-edgellm-arm64 . --> cross-builds successfully via qemu-user-static (10.4 GB, arm64)
  • GPU passthrough validated with both Podman (CDI) and Docker (--gpus all)
  • Exports and GPU accelerated inference works as expected

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No. Container builds are validated manually; no CI container build pipeline exists yet in this project but could be added in a separate pull request.
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Yes

Additional Information

I verified arm64 container builds via cross compiling but I do not have any arm64 hardware capable of connecting to a NVIDIA GPU available. I expect everything to work just fine, but I suggest that someone with access to arm64 hardware test this before merging this pull request.

Add a Containerfile with three build stages (python-export, cpp-build,
runtime) supporting x86_64 and aarch64, along with a .containerignore
and container usage documentation.

Relates to NVIDIA#32

Signed-off-by: Frederic Laing <dev@fredfunk.tech>
@FreddyFunk FreddyFunk requested a review from a team March 8, 2026 18:24
@nvluxiaoz
Copy link
Collaborator

Thanks a lot for your MR! TensorRT Edge-LLM were committing to provide some Docker recommendations, but not with a fixed Dockerfile now as Dockerfiles need careful maintenance and infra costs. The team will re-evaluate the need for this and evaluate whether this MR should be merged.

@nvluxiaoz
Copy link
Collaborator

Thanks for your PR! Our team will evaluate it later on whether this MR's approach is good, or to create DockerFiles for the environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants