Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions CODEGEN_BENCHMARK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Code Generation Benchmark Results

**Hardware:** AMD RX 9070 XT (gfx1201) | ROCm | 16GB VRAM
**Date:** 2026-06-07

## C# WinForms Code Generation

| Model | Status | Notes |
|-------|--------|-------|
| Devstral 2.5B | ✅ **PASS** | Generated standalone complete code, compiled successfully |
| Qwen2.5-7B | ❌ Partial | Uses partial class (incomplete for standalone) |
| Starcoder2-15B | ❌ Incomplete | Generated placeholder code only |
| Gemma4-MOE | ❌ Truncated | Hit token limit, incomplete |
| DeepSeek-Coder-V2 | ❌ Partial | Uses partial class |
| GLM-4-7 | Pending | Large model, still loading |

## Python Tkinter Code Generation

| Model | Status | Notes |
|-------|--------|-------|
| Qwen2.5-7B | ✅ **PASS** | Complete runnable Tkinter Notepad app |

## C++ Win32 Code Generation

| Model | Status | Notes |
|-------|--------|-------|
| Qwen2.5-7B | ⚠️ Partial | Basic skeleton, missing menu implementation |

## Output Files

- `codegen_output/devstral_notepad.exe` - Working compiled C# app
- `codegen_output/devstral_notepad.cs` - Generated source
- `codegen_output/py_notepad.py` - Complete Python app
- `codegen_output/deepseek_notepad.cs` - Code requiring designer file
36 changes: 24 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ A clean, traceable, single-branch fork of Ollama with native AMD Radeon RX 9070
This V5 update addresses the core inference hangs and kernel crashes on RX 9070 XT (gfx1201).

### The Fixes
1. **DLL Mismatch Fix (The Hang):** The system's default `amdhip64.dll` shipped with Windows drivers does not perfectly match the ROCm 6.2+ SDK requirements, leading to silent hangs during `llama-server` context creation.
- **Solution:** We explicitly ship and use `amdhip64_7.dll` (from the ROCm toolkit) renamed to `amdhip64.dll` in the `lib/ollama/rocm/` folder. This forces `ggml-hip.dll` to link against the correct driver interface.
1. **DLL Mismatch Fix (The Hang):** The system's default `amdhip64.dll` shipped with Windows drivers does not perfectly match the **ROCm 7.x** SDK requirements, leading to silent hangs during `llama-server` context creation.
- **Solution:** We explicitly ship and use `amdhip64_7.dll` (from the ROCm 7.x toolkit) renamed to `amdhip64.dll` in the `lib/ollama/rocm/` folder. This forces `ggml-hip.dll` to link against the correct driver interface.
2. **rocWMMA Disabled (The Crash):** Hardware matrix cores (rocWMMA) on early RDNA4 drivers cause instability and severe performance regressions (up to 73%).
- **Solution:** `rocWMMA` is explicitly disabled at compile time via `-DGGML_HIP_ROCWMMA=OFF`.

Expand Down Expand Up @@ -195,6 +195,20 @@ These are **stable, reproducible** numbers on a reference AMD Radeon RX 9070 XT
**vs. Stock Ollama**: ~4x improvement in generation speed, preventing the silent CPU fallback.
**vs. v4 patches**: ~15% faster thanks to conditional safety and disabled rocWMMA matrix cores.

### Code Generation Quality Tests

Code generation benchmark on RX 9070 XT:

| Model | C# | Python | C++ |
|-------|-----|--------|-----|
| Devstral 2.5B | ✅ PASS | - | - |
| Qwen2.5-7B | ⚠️ Partial | ✅ PASS | ⚠️ Partial |
| Starcoder2-15B | ❌ Incomplete | - | - |
| DeepSeek-Coder-V2 | ⚠️ Partial | - | - |
| GLM-4-7 | Pending | - | - |

See `CODEGEN_BENCHMARK.md` for full details.

---

## Troubleshooting
Expand Down Expand Up @@ -301,19 +315,17 @@ source ./scripts/env_gfx1201.sh

## Dashboard

Build with `-tags dashboard` to include the web dashboard:
The RDNA4 Dashboard is built-in and served automatically.

```bash
go build -tags dashboard -o ollama ./cmd/ollama
```
Ensure `dashboard.html` is in the same directory as your `ollama` executable (or running from the source root).

Then open `http://localhost:11434/dashboard/` while Ollama is running.
Then open `http://localhost:11434/dashboard/` in your browser while Ollama is running.

The dashboard shows live GPU metrics from `/api/dashboard/gpu`:
- Temperature, VRAM, utilization (read from rocm-smi/sysfs)
- Active optimization status
- Configuration warnings
- Performance metrics
The dashboard automatically polls the API to show:
- Live Temperature, VRAM, and GPU utilization
- Real-time generation speed (tok/s) and memory bandwidth
- Active optimization status (Flash Attention, Hip Graphs, Wave32)
- Configuration warnings (like TdrDelay warnings on Windows)

---

Expand Down
Binary file added codegen_output/devstral_notepad.exe
Binary file not shown.
13 changes: 13 additions & 0 deletions codegen_results.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Final CodeGen Results
======================

C# WinForms Notepad:
- Devstral: PASS (standalone, compiled to exe)
- DeepSeek-Coder-V2: Partial class (incomplete)
- Qwen/Starcoder/Gemma4: Partial class or placeholder

Python Tkinter Notepad:
- Qwen-2.5-7B: PASS (complete, runnable)

C++ Win32 Notepad:
- Qwen-2.5-7B: Partial skeleton (needs menu implementation)
Loading