Glaucis

LLM-driven evolutionary optimizer for JAX Pallas TPU kernels. Automatically discovers high-performance kernel variants through iterative mutation, evaluation on real TPU hardware (GKE TPU v7x), multi-signal profiling, and structured reflection.

How It Works

Glaucis operates as an interactive Claude Code skill loop:

                 pallas-evolve:start
                        |
          +-------------+-------------+
          |                           |
      THINK (LLM)              Read config &
    Generate N kernel           create GitHub
    variant mutations           tracking Issue
          |                           |
          +-------------+-------------+
                        |
                pallas-evolve:submit
            Build payloads, create K8s
            Job on GKE, collect results
                        |
                pallas-evolve:analyze
            Multi-signal bottleneck
            classification, top-K
            lineage selection
                        |
                pallas-evolve:reflect
            Extract failure patterns &
            successful optimizations
            into AGENT.md
                        |
                    compact
                 (loop back)

Each iteration:

Think -- Generate N kernel variants by mutating code between EVOLVE-BLOCK-START / EVOLVE-BLOCK-END markers, guided by learnings in AGENT.md
Submit -- Package variants into a K8s Job, deploy to GKE TPU v7x, run a 5-stage evaluation pipeline (compile, correctness, performance, XPlane trace, deep IR analysis)
Analyze -- Classify bottlenecks using compute ratio, VLIW bundles, MXU utilization, arithmetic intensity, HBM bandwidth, register spills, and more. Select top-K lineages for the next round.
Reflect -- Record failure patterns ([Fxxx]) and successful optimizations ([Sxxx]) into AGENT.md. Post round summary to the GitHub Issue.

Project Structure

Glaucis/
├── kernel-evolve/                    # Core Python package
│   ├── src/kernel_evolve/
│   │   ├── config.py                 # Pydantic YAML config validation
│   │   ├── mutation.py               # EVOLVE-BLOCK extraction/injection
│   │   ├── evaluator.py              # EvalResult/EvalRequest data classes
│   │   ├── kube_evaluator.py         # K8s Job submission via kubectl
│   │   ├── profiler.py               # XPlane trace & IR analysis (624 lines)
│   │   └── docker_evaluate_helpers.py # Batch subprocess dispatch
│   ├── docker/
│   │   ├── Dockerfile                # TPU evaluation container
│   │   └── evaluate.py               # 5-stage evaluator (runs inside K8s Pod)
│   ├── examples/
│   │   ├── matmul.yaml               # Simple tiled matmul config
│   │   ├── chunk_gla.yaml            # Chunked Gated Linear Attention config
│   │   ├── gmm_fp8_blockwise.yaml    # Grouped MatMul FP8 config
│   │   └── kernels/                  # Template & reference implementations
│   ├── plugins/pallas-evolve/        # Claude Code skill plugin
│   │   └── skills/{start,submit,analyze,reflect}/
│   ├── scripts/                      # Utility scripts
│   └── tests/                        # 13 test files
├── .github/
│   ├── workflows/
│   │   ├── kernel-eval.yaml          # TPU evaluation workflow
│   │   └── build-image.yaml          # Docker image build & push
│   └── ci/
│       ├── kernel-eval-job.yaml      # K8s Job template
│       ├── kernel-eval-gmm-job.yaml  # GMM FP8 variant Job template
│       └── xplane-explore-job.yaml   # XPlane exploration Job
├── AGENT.md                          # Accumulated optimization learnings
├── docs/plans/                       # Design & implementation documents
└── LICENSE                           # Apache 2.0

Prerequisites

Python >= 3.10
kubectl configured for a GKE cluster with TPU v7x nodes
GCS bucket for profile artifact storage (default: glaucis-profiles)
Claude Code CLI installed

Installation

1. Install the Python package

cd kernel-evolve
pip install -e ".[dev]"

Optional extras:

pip install -e ".[charts]"    # matplotlib for visualization
pip install -e ".[profile]"   # xprof for trace analysis

2. Install the pallas-evolve skill plugin

The pallas-evolve plugin ships with 6 skills that drive the optimization loop inside Claude Code.

Option A: Local plugin (for development)

Add to your project's .claude/settings.local.json:

{
  "enabledPlugins": {
    "pallas-evolve@local": true
  }
}

Claude Code will discover the plugin from kernel-evolve/plugins/pallas-evolve/.claude-plugin/plugin.json.

Option B: From repository

{
  "enabledPlugins": {
    "pallas-evolve@https://github.com/sii-xinglong/Glaucis": true
  }
}

3. Verify skill availability

Launch Claude Code in the project directory. You should see these skills available:

Skill	Purpose
`pallas-evolve:init-kernel`	Initialize a kernel project from upstream
`pallas-evolve:profile-brief`	Generate a profile brief from TPU eval artifacts
`pallas-evolve:start`	Start an optimization session
`pallas-evolve:submit`	Submit a batch of variants for TPU eval
`pallas-evolve:analyze`	Analyze batch evaluation results
`pallas-evolve:reflect`	Record learnings to AGENT.md

Usage

Start an optimization session

In Claude Code, invoke the start skill with a config:

/start examples/matmul.yaml

This will:

Parse the config and validate shapes/correctness settings
Create a GitHub Issue for tracking
Enter the think-submit-analyze-reflect loop

Optimization modes

Step-by-step (default) -- Pauses after each phase for human review
Autonomous -- Runs the full loop unattended until max iterations or target speedup is reached

Writing a new kernel config

Create a YAML config pointing to your kernel template and reference:

kernel:
  name: "my_kernel"
  template: "kernels/my_kernel.py"
  reference: "kernels/my_kernel_ref.py"
  markers:
    start: "# EVOLVE-BLOCK-START"
    end: "# EVOLVE-BLOCK-END"

shapes:
  - { M: 1024, N: 1024, K: 1024 }

correctness:
  atol: 1e-2
  rtol: 1e-2

evaluator:
  namespace: "default"
  job_template: ".github/ci/kernel-eval-job.yaml"
  repo: "sii-xinglong/Glaucis"
  branch: "main"

tpu:
  cluster: "tpu7x-cluster"
  zone: "us-central1"

session:
  max_iterations: 20
  output_dir: "runs/my_kernel"

batch:
  variants_per_round: 4
  top_k: 2
  max_active_lineages: 6

Mark the mutable region in your kernel template:

# EVOLVE-BLOCK-START
def my_kernel(x_ref, y_ref, o_ref):
    # ... kernel code to be optimized ...
# EVOLVE-BLOCK-END

Included kernel targets

Kernel	Description	Config
`matmul`	Tiled matrix multiplication	`examples/matmul.yaml`
`chunk_gla`	Chunked Gated Linear Attention (fwd + bwd)	`examples/chunk_gla.yaml`
`gmm_fp8_blockwise`	Grouped MatMul with FP8 block-wise quantization	`examples/gmm_fp8_blockwise.yaml`

Evaluation Pipeline

The evaluator runs inside a K8s Pod on GKE TPU v7x and executes 5 stages:

Compile -- exec() the kernel, jax.jit + lower to HLO
Correctness -- Compare against reference implementation with np.testing.assert_allclose
Performance -- 10 warmup + 50 timed iterations, report median latency and speedup
XPlane Trace -- JAX profiler capture, compute ratio, memory transfer ratio, per-unit utilization (MXU, Scalar ALU, Vector ALU, Vector Load/Store)
Deep IR -- HLO/LLO/Mosaic dump parsing: VLIW bundles, MXU distribution, VMEM allocations, bundle density, DMA analysis, HBM bandwidth, FLOP counts, arithmetic intensity

Profile artifacts (HLO, LLO, trace events) are uploaded to GCS for post-hoc analysis.

Infrastructure

Component	Detail
Cloud	Google Cloud Platform
Cluster	`tpu7x-cluster` (GKE, `us-central1`)
TPU	v7x (Ironwood), 2x2x1 topology, 4 chips
Peak Compute	2307 TFLOPS (BF16)
HBM Bandwidth	3690 GB/s
Container Registry	`us-central1-docker.pkg.dev/tpu-service-473302/glaucis/kernel-eval`
Artifact Storage	GCS bucket `glaucis-profiles`
CI Auth	Workload Identity Federation for GitHub Actions

Development

cd kernel-evolve

# Run tests
pytest

# Lint
ruff check src/ tests/
ruff format --check src/ tests/

License

Apache 2.0 -- see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glaucis

How It Works

Project Structure

Prerequisites

Installation

1. Install the Python package

2. Install the pallas-evolve skill plugin

3. Verify skill availability

Usage

Start an optimization session

Optimization modes

Writing a new kernel config

Included kernel targets

Evaluation Pipeline

Infrastructure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 271 Commits
.claude		.claude
.github		.github
docs		docs
kernel-evolve		kernel-evolve
.gitignore		.gitignore
AGENT.md		AGENT.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Glaucis

How It Works

Project Structure

Prerequisites

Installation

1. Install the Python package

2. Install the pallas-evolve skill plugin

3. Verify skill availability

Usage

Start an optimization session

Optimization modes

Writing a new kernel config

Included kernel targets

Evaluation Pipeline

Infrastructure

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages