Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
c494685
mlir init
Menooker May 6, 2026
56ad948
kunir -> kungpu
Menooker May 6, 2026
fe88c9f
memory planning
Menooker May 6, 2026
618ed08
kunir.func
Menooker May 6, 2026
dd459e5
kungpu to llvm
Menooker May 7, 2026
7e62458
pipeline to llvm
Menooker May 7, 2026
e4271d8
ptx backend
Menooker May 7, 2026
cbd4517
JIT
Menooker May 7, 2026
443b85e
fix unaligned stocks
Menooker May 7, 2026
82177f8
multi-kernel
Menooker May 8, 2026
89b6645
pybind
Menooker May 8, 2026
84b3376
use upstream gpu-to-binary. Support math functions
Menooker May 8, 2026
eab54af
backref, fast windowed sum
Menooker May 8, 2026
ac21bdd
rename library
Menooker May 8, 2026
9ab700f
add executor
Menooker May 8, 2026
8f4b7f6
partition
Menooker May 8, 2026
97fc8a0
enable cs_rank
Menooker May 12, 2026
05173a7
remove cuda interface array support
Menooker May 12, 2026
d83754c
Merge remote-tracking branch 'origin/main' into mlir
Menooker May 13, 2026
a4830f2
nanobind
Menooker May 13, 2026
b8c04b8
time slice (mlir part)
Menooker May 13, 2026
5cc38e3
time slice (python side)
Menooker May 13, 2026
3203b11
time slice on rank
Menooker May 14, 2026
da52d62
boolean ops
Menooker May 14, 2026
40be63f
refactor kunir-to-kungpu + test backwindow outer-ts read
Menooker May 14, 2026
228b062
Add constant op
Menooker May 14, 2026
aea24d3
accumulator
Menooker May 14, 2026
0a610db
fix require whole time
Menooker May 14, 2026
5a304a4
fix warnings
Menooker May 15, 2026
45a2d86
align with cpu rungraph and compileit. Fix mask time length
Menooker May 15, 2026
9d08b0c
basic runtime test
Menooker May 18, 2026
1b4595b
log, pow tests. mean-std WIP
Menooker May 18, 2026
9f1b9a5
fix time index
Menooker May 18, 2026
0e3d988
enable f64
Menooker May 18, 2026
fa9267a
argmin/max/tsrank
Menooker May 18, 2026
accb202
WindowLoopIndex
Menooker May 18, 2026
2f7394c
ema and linear regression
Menooker May 18, 2026
55f9c7a
alpha158. Fix Output(Temp(...))
Menooker May 19, 2026
b73aa91
optimize partitioner: avoid Input -> WindowedTempOutput -> Output par…
Menooker May 19, 2026
0522771
Merge branch 'main' of https://github.com/Menooker/KunQuant into mlir
Menooker May 19, 2026
9eef8c1
is_whole_time_required for talib
Menooker May 19, 2026
5cf1810
Merge branch 'main' of https://github.com/Menooker/KunQuant into mlir
Menooker May 21, 2026
50b1f5e
alpha101, fix time slice temp_window/output (should not use output)
Menooker May 21, 2026
3fee085
scale op
Menooker May 21, 2026
3092e15
alpha101 benchmark
Menooker May 22, 2026
8e7d361
cuda graph
Menooker May 22, 2026
d5f3ff0
llvm workflow
Menooker May 23, 2026
6a3b399
checkout tag
Menooker May 23, 2026
bf20f5a
checkout tag
Menooker May 23, 2026
f0a78a2
make it run on sm61 and llvm 22
Menooker May 24, 2026
49bc311
Merge branch 'mlir' of https://github.com/Menooker/KunQuant into mlir
Menooker May 24, 2026
a78602b
fix tests
Menooker May 25, 2026
9069046
enable dynlink
Menooker May 25, 2026
d0e63c9
shared executable data
Menooker May 25, 2026
d328fd9
dump to file
Menooker May 25, 2026
fb8c8cb
lit driven tests
Menooker May 25, 2026
360f67a
enable ci
Menooker May 25, 2026
1f6127c
cuda cache
Menooker May 25, 2026
6abd039
split packages
Menooker May 25, 2026
bbd2b8f
publish
Menooker May 26, 2026
ef1eab8
copy buffer in bench
Menooker May 26, 2026
2ad5762
overlap runner
Menooker May 26, 2026
00718c9
bundle tests
Menooker May 26, 2026
c493476
Merge branch 'main' of https://github.com/Menooker/KunQuant into mlir
Menooker May 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/Dockerfile.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM quay.io/pypa/manylinux_2_28_x86_64:2025.05.16-1

RUN rm -rf /opt/_internal/pipx/venvs/cmake \
&& dnf install -y \
ca-certificates \
cmake \
curl \
dnf-plugins-core \
libxml2-devel \
libzstd-devel \
ninja-build \
pkgconf-pkg-config \
zlib-devel \
&& dnf config-manager --add-repo \
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo \
&& dnf install -y cuda-toolkit-13-2 \
&& dnf clean all \
&& rm -rf /var/cache/dnf

RUN /opt/python/cp312-cp312/bin/python -m pip install --no-cache-dir lit \
&& ln -s /opt/python/cp312-cp312/bin/lit /usr/local/bin/lit

ENV CUDA_PATH=/usr/local/cuda-13.2
ENV CUDA_HOME=/usr/local/cuda-13.2
ENV CUDAToolkit_ROOT=/usr/local/cuda-13.2
ENV CMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc
ENV PATH=/usr/local/cuda-13.2/bin:${PATH}
ENV LD_LIBRARY_PATH=/usr/local/cuda-13.2/lib64:/usr/local/cuda-13.2/lib64/stubs

RUN ln -sf libcuda.so /usr/local/cuda-13.2/lib64/stubs/libcuda.so.1
58 changes: 25 additions & 33 deletions .github/workflows/build-llvm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,18 @@ jobs:
const content = fs.readFileSync('mlir/llvm_commit.txt', 'utf8');
const commit = content.split(/\r?\n/).find(l => !/^\s*#/.test(l) && l.trim() !== '') || '';
if (!commit) throw new Error('No commit found in mlir/llvm_commit.txt');
return commit;
core.setOutput('commit', commit.trim());
return commit.trim();

- name: Checkout llvm-project at commit
uses: actions/checkout@v4
with:
repository: llvm/llvm-project
ref: ${{ steps.read_commit.outputs.result }}
# treat the value as a tag name and fetch tags so shallow fetch can find it
ref: refs/tags/${{ steps.read_commit.outputs.commit }}
path: llvm-project
fetch-depth: 1 # shallow clone, only the specified ref
fetch-depth: 1
fetch-tags: true

- name: Install build dependencies
run: |
Expand All @@ -49,7 +52,10 @@ jobs:
-DLLVM_ENABLE_PROJECTS="mlir" \
-DLLVM_TARGETS_TO_BUILD="NVPTX" \
-DCMAKE_BUILD_TYPE=MinSizeRel \
-DCMAKE_INSTALL_PREFIX=install
-DLLVM_BUILD_TOOLS=OFF \
-DLLVM_BUILD_TESTS=ON \
-DLLVM_INSTALL_UTILS=ON \
-DCMAKE_INSTALL_PREFIX=${{ github.workspace }}/build-static/install

- name: Configure (dynamic)
if: matrix.link == 'dynamic'
Expand All @@ -58,11 +64,14 @@ jobs:
-DLLVM_ENABLE_PROJECTS="mlir" \
-DLLVM_TARGETS_TO_BUILD="NVPTX" \
-DCMAKE_BUILD_TYPE=MinSizeRel \
-DMLIR_BUILD_LLVM_DYLIB=ON \
-DMLIR_LINK_LLVM_DYLIB=ON \
-DLLVM_BUILD_TOOLS=OFF \
-DMLIR_BUILD_MLIR_DYLIB=ON \
-DMLIR_LINK_MLIR_DYLIB=ON \
-DLLVM_BUILD_LLVM_DYLIB=ON \
-DLLVM_LINK_LLVM_DYLIB=ON \
-DCMAKE_INSTALL_PREFIX=install
-DLLVM_BUILD_TESTS=ON \
-DLLVM_INSTALL_UTILS=ON \
-DCMAKE_INSTALL_PREFIX=${{ github.workspace }}/build-dynamic/install

- name: Build and install
run: |
Expand All @@ -71,7 +80,7 @@ jobs:
- name: Create archive of install
run: |
mkdir -p artifacts
COMMIT=${{ steps.read_commit.outputs.result }}
COMMIT=${{ steps.read_commit.outputs.commit }}
tar -C build-${{ matrix.link }} -czf artifacts/llvm-mlir-install-${{ matrix.link }}-${COMMIT}.tar.gz install
echo "Created artifacts/llvm-mlir-install-${{ matrix.link }}-${COMMIT}.tar.gz"

Expand Down Expand Up @@ -110,30 +119,13 @@ jobs:
const content = fs.readFileSync('mlir/llvm_commit.txt', 'utf8');
const commit = content.split(/\r?\n/).find(l => !/^\s*#/.test(l) && l.trim() !== '') || '';
if (!commit) throw new Error('No commit found in mlir/llvm_commit.txt');
core.setOutput('commit', commit);
return commit;

- name: Create GitHub Release
id: create_release
uses: actions/create-release@v1
with:
tag_name: llvm-mlir-${{ steps.read_commit_publish.outputs.result }}
release_name: LLVM-MLIR ${{ steps.read_commit_publish.outputs.result }}
body: "Automated build artifacts for LLVM+MLIR"
draft: false
prerelease: false

- name: Upload static asset to release
uses: actions/upload-release-asset@v1
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
asset_path: artifacts/llvm-mlir-install-static-${{ steps.read_commit_publish.outputs.result }}.tar.gz
asset_name: llvm-mlir-install-static-${{ steps.read_commit_publish.outputs.result }}.tar.gz
asset_content_type: application/gzip

- name: Upload dynamic asset to release
uses: actions/upload-release-asset@v1
- name: Release
uses: softprops/action-gh-release@v3
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
asset_path: artifacts/llvm-mlir-install-dynamic-${{ steps.read_commit_publish.outputs.result }}.tar.gz
asset_name: llvm-mlir-install-dynamic-${{ steps.read_commit_publish.outputs.result }}.tar.gz
asset_content_type: application/gzip
tag_name: llvm-mlir-${{ steps.read_commit_publish.outputs.commit }}
name: LLVM-MLIR prebuilt ${{ steps.read_commit_publish.outputs.commit }}
files: |
artifacts/llvm-mlir-install-static-${{ steps.read_commit_publish.outputs.commit }}.tar.gz
artifacts/llvm-mlir-install-dynamic-${{ steps.read_commit_publish.outputs.commit }}.tar.gz
128 changes: 127 additions & 1 deletion .github/workflows/ccpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,4 +110,130 @@ jobs:
- name: Alpha158 test
working-directory: ./
run: |
python ./tests/test_alpha158.py --inputs ./input.npz --ref ./alpha158.npz --action run_avx2
python ./tests/test_alpha158.py --inputs ./input.npz --ref ./alpha158.npz --action run_avx2
cuda-mlir:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- name: Install build dependencies
run: |
set -eux
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
cmake \
curl \
git \
libxml2-dev \
libzstd-dev \
ninja-build \
pkg-config \
zlib1g-dev
- name: Cache CUDA 13.2 toolkit
id: cache-cuda
uses: actions/cache@v4
with:
path: .cache/cuda-13.2
key: ${{ runner.os }}-${{ runner.arch }}-cuda-mlir-minimal-13.2-v1
- name: Install CUDA 13.2
if: steps.cache-cuda.outputs.cache-hit != 'true'
run: |
set -eux
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin \
-o /tmp/cuda-ubuntu2404.pin
sudo install -m 0644 /tmp/cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404-keyring.gpg \
-o /tmp/cuda-archive-keyring.gpg
sudo install -m 0644 /tmp/cuda-archive-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /" \
| sudo tee /etc/apt/sources.list.d/cuda-ubuntu2404-x86_64.list
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
cuda-nvcc-13-2 \
cuda-cudart-dev-13-2 \
cuda-driver-dev-13-2 \
libnvvm-13-2
mkdir -p .cache
sudo tar -C /usr/local -cf "$RUNNER_TEMP/cuda-13.2.tar" cuda-13.2
tar -C .cache -xf "$RUNNER_TEMP/cuda-13.2.tar"
- name: Prepare CUDA 13.2
run: |
set -eux
if [ ! -d /usr/local/cuda-13.2 ]; then
sudo ln -s "$GITHUB_WORKSPACE/.cache/cuda-13.2" /usr/local/cuda-13.2
fi
sudo ln -sf libcuda.so /usr/local/cuda-13.2/lib64/stubs/libcuda.so.1
test -x /usr/local/cuda-13.2/bin/nvcc
test -x /usr/local/cuda-13.2/bin/ptxas
test -f /usr/local/cuda-13.2/include/cuda.h
test -f /usr/local/cuda-13.2/include/cuda_runtime.h
test -f /usr/local/cuda-13.2/lib64/stubs/libcuda.so
test -f /usr/local/cuda-13.2/nvvm/libdevice/libdevice.10.bc
/usr/local/cuda-13.2/bin/nvcc --version
- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
python -m pip install numpy==1.26.4 lit cupy-cuda13x
lit --version
- name: Read LLVM tag
id: llvm_tag
run: |
tag="$(sed -e 's/#.*//' -e '/^[[:space:]]*$/d' mlir/llvm_commit.txt | head -n1 | tr -d '[:space:]')"
test -n "$tag"
echo "tag=$tag" >> "$GITHUB_OUTPUT"
echo "LLVM tag: $tag"
- name: Download prebuilt LLVM/MLIR
env:
LLVM_TAG: ${{ steps.llvm_tag.outputs.tag }}
run: |
set -eux
mkdir -p "$RUNNER_TEMP/llvm-mlir"
curl -fL --retry 3 \
"https://github.com/Menooker/KunQuant/releases/download/llvm-mlir-${LLVM_TAG}/llvm-mlir-install-static-${LLVM_TAG}.tar.gz" \
-o "$RUNNER_TEMP/llvm-mlir.tar.gz"
tar -xzf "$RUNNER_TEMP/llvm-mlir.tar.gz" -C "$RUNNER_TEMP/llvm-mlir" --strip-components=1
test -f "$RUNNER_TEMP/llvm-mlir/lib/cmake/mlir/MLIRConfig.cmake"
test -f "$RUNNER_TEMP/llvm-mlir/lib/cmake/llvm/LLVMConfig.cmake"
echo "LLVM_PREFIX=$RUNNER_TEMP/llvm-mlir" >> "$GITHUB_ENV"
- name: Configure MLIR backend
env:
CUDA_PATH: /usr/local/cuda-13.2
CUDA_HOME: /usr/local/cuda-13.2
run: |
cmake -S . -B build/mlir-ci -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DKUN_BUILD_CPU_RUNNER=OFF \
-DKUN_BUILD_MLIR=ON \
-DLLVM_DIR="$LLVM_PREFIX/lib/cmake/llvm" \
-DMLIR_DIR="$LLVM_PREFIX/lib/cmake/mlir" \
-DCUDAToolkit_ROOT=/usr/local/cuda-13.2 \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc \
-DPython_EXECUTABLE="$(python -c 'import sys; print(sys.executable)')" \
-DPYTHON_EXECUTABLE="$(python -c 'import sys; print(sys.executable)')" \
-DLLVM_EXTERNAL_LIT="$(command -v lit)"
- name: Run KunQuant MLIR tests
env:
CUDA_PATH: /usr/local/cuda-13.2
CUDA_HOME: /usr/local/cuda-13.2
run: cmake --build build/mlir-ci --target check-kun-mlir --parallel 4
- name: Check KunQuant-MLIR imports
env:
CUDA_PATH: /usr/local/cuda-13.2
CUDA_HOME: /usr/local/cuda-13.2
run: |
export LD_LIBRARY_PATH="$LLVM_PREFIX/lib:/usr/local/cuda-13.2/lib64/stubs:${LD_LIBRARY_PATH:-}"
python - <<'PY'
import KunQuantMLIR.KunMLIR as direct
from KunQuant.jit import KunMLIR as compat
import KunQuant.jit.KunMLIR as submodule
assert direct.__file__ == compat.__file__ == submodule.__file__
assert "/KunQuantMLIR/" in direct.__file__
print(direct.__file__)
PY
30 changes: 27 additions & 3 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,34 @@ name: Create and publish a Docker image
# Configures this workflow to run every time a change is pushed to the branch called `release`.
on:
workflow_dispatch:
inputs:
image:
description: Image to build
required: true
default: both
type: choice
options:
- core
- mlir
- both

# Defines two custom environment variables for the workflow. These are used for the Container registry domain, and a name for the Docker image that this workflow builds.
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

# There is a single job in this workflow. It's configured to run on the latest available version of Ubuntu.
jobs:
build-and-push-image:
runs-on: ubuntu-latest
strategy:
matrix:
include:
- target: core
image: ghcr.io/menooker/kunquant
dockerfile: Dockerfile
- target: mlir
image: ghcr.io/menooker/kunquant-mlir
dockerfile: Dockerfile.mlir
# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in this job.
permissions:
contents: read
Expand All @@ -23,37 +41,43 @@ jobs:
#
steps:
- name: Checkout repository
if: ${{ inputs.image == matrix.target || inputs.image == 'both' }}
uses: actions/checkout@v4
# Uses the `docker/login-action` action to log in to the Container registry registry using the account and password that will publish the packages. Once published, the packages are scoped to the account defined here.
- name: Log in to the Container registry
if: ${{ inputs.image == matrix.target || inputs.image == 'both' }}
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# This step uses [docker/metadata-action](https://github.com/docker/metadata-action#about) to extract tags and labels that will be applied to the specified image. The `id` "meta" allows the output of this step to be referenced in a subsequent step. The `images` value provides the base name for the tags and labels.
- name: Extract metadata (tags, labels) for Docker
if: ${{ inputs.image == matrix.target || inputs.image == 'both' }}
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
images: ${{ matrix.image }}
# This step uses the `docker/build-push-action` action to build the image, based on your repository's `Dockerfile`. If the build succeeds, it pushes the image to GitHub Packages.
# It uses the `context` parameter to define the build's context as the set of files located in the specified path. For more information, see [Usage](https://github.com/docker/build-push-action#usage) in the README of the `docker/build-push-action` repository.
# It uses the `tags` and `labels` parameters to tag and label the image with the output from the "meta" step.
- name: Build and push Docker image
if: ${{ inputs.image == matrix.target || inputs.image == 'both' }}
id: push
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .github/workflows
file: .github/workflows/${{ matrix.dockerfile }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

# This step generates an artifact attestation for the image, which is an unforgeable statement about where and how it was built. It increases supply chain security for people who consume the image. For more information, see [Using artifact attestations to establish provenance for builds](/actions/security-guides/using-artifact-attestations-to-establish-provenance-for-builds).
- name: Generate artifact attestation
if: ${{ inputs.image == matrix.target || inputs.image == 'both' }}
uses: actions/attest-build-provenance@v2
with:
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
subject-name: ${{ matrix.image }}
subject-digest: ${{ steps.push.outputs.digest }}
push-to-registry: true

Loading
Loading