Native build and release pipeline for llamadart binaries.
This repository is responsible for:
- Building native
llamadartbinaries across platforms. - Publishing release artifacts consumed by
llamadartbuild hooks. - Producing release metadata (
assets.json+SHA256SUMS).
The Dart API/runtime stays in the main llamadart repository.
Native Build & Release(.github/workflows/native_release.yml)- Manual dispatch.
- Builds one full backend set per platform/arch target.
- Fails when any enabled backend in that target fails.
- Publishes per-target native assets (Apple consolidated, others split core/backend libs).
- Generates
assets.jsonandSHA256SUMS.
Auto Trigger Native Release(.github/workflows/auto_native_release.yml)- Daily schedule plus manual dispatch.
- Resolves latest upstream
ggml-org/llama.cpprelease tag. - Dispatches
Native Build & Releaseonly when this repo does not already have that tag and no native release run is in flight.
The published tag is the native version contract consumed by downstream package
hooks and Swift Package manifests. When changing the upstream llama.cpp
version:
- Run
Native Build & Releasefor the selectedllama_cpp_tag, or letAuto Trigger Native Releasedispatch it for the latest upstream tag. - Verify the release contains per-platform native archives,
llamadart-native-apple-xcframework-<tag>.zip,assets.json, andSHA256SUMS. - Update downstream
llamadartpins, SPM URLs, and SPM checksums together so native-assets and SPM consumers use the same wrapper/runtime build.
Each target builds all worthy backends together in one build:
- Android: arm64 = Vulkan + OpenCL + CPU variants (Kleidi-enabled where safe); x86_64 = Vulkan + OpenCL + CPU
- iOS/macOS: Metal + CPU (consolidated into
libllamadart, BLAS/Kleidi disabled) - Linux x64: Vulkan + CUDA + BLAS + ZenDNN + CPU
- Linux arm64: Vulkan + BLAS + Kleidi + CPU
- Windows x64: Vulkan + CUDA + BLAS + CPU
- Windows arm64: Vulkan + BLAS + Kleidi + CPU
Non-Apple targets use GGML_BACKEND_DL=ON, so backend libs are optional at package/runtime level.
Release assets contain:
- Apple: consolidated
libllamadartper target. - Apple SPM:
llamadart-native-apple-xcframework-<tag>.zip, allamadart_native.xcframeworkbuilt from the same Apple slices and wrapper code as the native-assets tarballs. - Non-Apple core libs:
llamadart,llama,llama-common,ggml,ggml-base(andmtmdwhere produced) - Non-Apple backend libs:
ggml-<backend>modules (ggml-vulkan,ggml-opencl, etc.) - Windows backend runtime deps:
- CUDA lanes include CUDA runtime DLLs required by
ggml-cuda(for examplecudart64_*.dll,cublas64_*.dll). - BLAS lanes include
openblas*.dllrequired byggml-blas. - NVIDIA driver DLLs (for example
nvcuda.dll) are not bundled and are provided by GPU drivers.
- CUDA lanes include CUDA runtime DLLs required by
- Headers archive:
llamadart-native-headers-<tag>.tar.gzwithllama_cpp/...andlibllamadart/...roots, including llama.cpp, ggml, mtmd, andllama_dart_wrapper.h.
Consumers can choose which backend libs to include in their package and load at runtime.
Assets are suffixed with platform/arch, for example:
libllamadart-linux-x64.solibllama-linux-x64.solibggml-vulkan-linux-x64.solibggml-opencl-android-arm64.soggml-cuda-windows-x64.dll
.github/workflows/auto_native_release.yml: daily upstream tag watcher + native release dispatcher..github/workflows/native_release.yml: build + package + release..gitmodules: pinned native dependency submodules.CMakeLists.txt+CMakePresets.json: root-native build configuration.src/:llama_dart_wrapper.*.third_party/llama.cpp: upstream llama.cpp submodule.third_party/Vulkan-Headers: Vulkan API headers submodule for Android Vulkan builds.third_party/SPIRV-Headers: SPIR-V registry headers required by thellama.cppVulkan backend.third_party/OpenCL-Headers: OpenCL headers submodule (Android OpenCL builds).third_party/OpenCL-ICD-Loader: OpenCL loader submodule used to produce AndroidlibOpenCL.sowhen NDK does not provide one.third_party/opencl-stubs: optional local fallback location for OpenCL headers/stubs.tools/build.py: cross-platform build entrypoint.tools/validate_exports.py: verifies required wrapper C exports, including MTP symbols, in release artifacts.tools/package_apple_xcframework.py: packages Applelibllamadartslices as an SPM-compatible XCFramework zip.scripts/generate_assets_manifest.sh: buildsassets.json+ checksums.docs/platform_backend_strategy.md: platform/backend matrix.
Builds are primarily driven by root CMakePresets.json via tools/build.py.
Android arm64 CPU variants use isolated CMake build directories so per-variant
ISA flags remain correct while packaging the full variant matrix. The raw
android-arm64-v8a-full preset now represents the primary arm64 build, while
tools/build.py assembles the additional CPU variant outputs.
Examples:
# macOS arm64 (Metal + CPU)
python3 tools/build.py apple --target macos-arm64
# Linux x64 (Vulkan + CUDA + BLAS + ZenDNN + CPU)
python3 tools/build.py linux --arch x64
# Android both ABIs (arm64: Vulkan + OpenCL + CPU variants; x86_64: Vulkan + OpenCL + CPU)
python3 tools/build.py android --abi all
# Windows x64 (Vulkan + CUDA + BLAS + CPU)
python3 tools/build.py windows --arch x64
# Windows arm64 (Vulkan + BLAS + Kleidi + CPU)
python3 tools/build.py windows --arch arm64List supported combinations:
python3 tools/build.py listInitialize submodules after clone:
git submodule update --init --recursiveMSVC release builds keep interprocedural optimization enabled by default, but
llama-common is excluded from IPO/LTCG. Upstream llama-common is a large
utility DLL, and current MSVC link.exe can access-violate while linking it
with /LTCG. The override keeps Windows release artifacts reproducible without
changing the runtime packaging model.
To retest MSVC IPO after a compiler or upstream change, configure with:
cmake --preset windows-x64-full -DLLAMADART_MSVC_LLAMA_COMMON_IPO=ONUse tools/docker_build_linux.sh to build Linux targets in a cached Docker
image. The image is based on NVIDIA CUDA 12.8.1 and keeps heavy dependencies
(CUDA, cross toolchains, Vulkan/BLAS dev packages) in reusable layers, so repeat
builds are faster.
This Docker flow is for local development only; CI Linux jobs run on native GitHub runners.
# Linux x64 full set
./tools/docker_build_linux.sh --arch x64 --jobs 8
# Linux arm64 full set (cross-compiled in container)
./tools/docker_build_linux.sh --arch arm64 --jobs 8
# Build both Linux targets
./tools/docker_build_linux.sh --arch all --jobs 8Useful flags:
--clean: clean preset build directories before build--rebuild-image: force image refresh--platform: override Docker platform (defaultlinux/amd64)--image: custom image tag
Outputs are written to bin/linux/x64 and bin/linux/arm64.
Note: Kleidi-enabled lanes require network access to fetch upstream Kleidi sources.
Android arm64 CPU variants are built in isolated configurations so Kleidi can
stay enabled without leaking newer ISA flags into lower-tier variant binaries.
Android OpenCL override env vars (optional):
OPENCL_INCLUDE_DIR=/path/to/opencl/headersOPENCL_LIBRARY_ANDROID_ARM64_V8A=/path/to/arm64/libOpenCL.soOPENCL_LIBRARY_ANDROID_X86_64=/path/to/x86_64/libOpenCL.so
AGENTS.md: agent workflow and cross-repo handoffCONTRIBUTING.md: contributor setup/build/release steps