diff --git a/CODEGEN_BENCHMARK.md b/CODEGEN_BENCHMARK.md new file mode 100644 index 00000000000..70c6c93c498 --- /dev/null +++ b/CODEGEN_BENCHMARK.md @@ -0,0 +1,34 @@ +# Code Generation Benchmark Results + +**Hardware:** AMD RX 9070 XT (gfx1201) | ROCm | 16GB VRAM +**Date:** 2026-06-07 + +## C# WinForms Code Generation + +| Model | Status | Notes | +|-------|--------|-------| +| Devstral 2.5B | ✅ **PASS** | Generated standalone complete code, compiled successfully | +| Qwen2.5-7B | ❌ Partial | Uses partial class (incomplete for standalone) | +| Starcoder2-15B | ❌ Incomplete | Generated placeholder code only | +| Gemma4-MOE | ❌ Truncated | Hit token limit, incomplete | +| DeepSeek-Coder-V2 | ❌ Partial | Uses partial class | +| GLM-4-7 | Pending | Large model, still loading | + +## Python Tkinter Code Generation + +| Model | Status | Notes | +|-------|--------|-------| +| Qwen2.5-7B | ✅ **PASS** | Complete runnable Tkinter Notepad app | + +## C++ Win32 Code Generation + +| Model | Status | Notes | +|-------|--------|-------| +| Qwen2.5-7B | ⚠️ Partial | Basic skeleton, missing menu implementation | + +## Output Files + +- `codegen_output/devstral_notepad.exe` - Working compiled C# app +- `codegen_output/devstral_notepad.cs` - Generated source +- `codegen_output/py_notepad.py` - Complete Python app +- `codegen_output/deepseek_notepad.cs` - Code requiring designer file \ No newline at end of file diff --git a/README.md b/README.md index 2fb32488a11..140c35ea120 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,8 @@ A clean, traceable, single-branch fork of Ollama with native AMD Radeon RX 9070 This V5 update addresses the core inference hangs and kernel crashes on RX 9070 XT (gfx1201). ### The Fixes -1. **DLL Mismatch Fix (The Hang):** The system's default `amdhip64.dll` shipped with Windows drivers does not perfectly match the ROCm 6.2+ SDK requirements, leading to silent hangs during `llama-server` context creation. - - **Solution:** We explicitly ship and use `amdhip64_7.dll` (from the ROCm toolkit) renamed to `amdhip64.dll` in the `lib/ollama/rocm/` folder. This forces `ggml-hip.dll` to link against the correct driver interface. +1. **DLL Mismatch Fix (The Hang):** The system's default `amdhip64.dll` shipped with Windows drivers does not perfectly match the **ROCm 7.x** SDK requirements, leading to silent hangs during `llama-server` context creation. + - **Solution:** We explicitly ship and use `amdhip64_7.dll` (from the ROCm 7.x toolkit) renamed to `amdhip64.dll` in the `lib/ollama/rocm/` folder. This forces `ggml-hip.dll` to link against the correct driver interface. 2. **rocWMMA Disabled (The Crash):** Hardware matrix cores (rocWMMA) on early RDNA4 drivers cause instability and severe performance regressions (up to 73%). - **Solution:** `rocWMMA` is explicitly disabled at compile time via `-DGGML_HIP_ROCWMMA=OFF`. @@ -195,6 +195,20 @@ These are **stable, reproducible** numbers on a reference AMD Radeon RX 9070 XT **vs. Stock Ollama**: ~4x improvement in generation speed, preventing the silent CPU fallback. **vs. v4 patches**: ~15% faster thanks to conditional safety and disabled rocWMMA matrix cores. +### Code Generation Quality Tests + +Code generation benchmark on RX 9070 XT: + +| Model | C# | Python | C++ | +|-------|-----|--------|-----| +| Devstral 2.5B | ✅ PASS | - | - | +| Qwen2.5-7B | ⚠️ Partial | ✅ PASS | ⚠️ Partial | +| Starcoder2-15B | ❌ Incomplete | - | - | +| DeepSeek-Coder-V2 | ⚠️ Partial | - | - | +| GLM-4-7 | Pending | - | - | + +See `CODEGEN_BENCHMARK.md` for full details. + --- ## Troubleshooting @@ -301,19 +315,17 @@ source ./scripts/env_gfx1201.sh ## Dashboard -Build with `-tags dashboard` to include the web dashboard: +The RDNA4 Dashboard is built-in and served automatically. -```bash -go build -tags dashboard -o ollama ./cmd/ollama -``` +Ensure `dashboard.html` is in the same directory as your `ollama` executable (or running from the source root). -Then open `http://localhost:11434/dashboard/` while Ollama is running. +Then open `http://localhost:11434/dashboard/` in your browser while Ollama is running. -The dashboard shows live GPU metrics from `/api/dashboard/gpu`: -- Temperature, VRAM, utilization (read from rocm-smi/sysfs) -- Active optimization status -- Configuration warnings -- Performance metrics +The dashboard automatically polls the API to show: +- Live Temperature, VRAM, and GPU utilization +- Real-time generation speed (tok/s) and memory bandwidth +- Active optimization status (Flash Attention, Hip Graphs, Wave32) +- Configuration warnings (like TdrDelay warnings on Windows) --- diff --git a/codegen_output/devstral_notepad.exe b/codegen_output/devstral_notepad.exe new file mode 100644 index 00000000000..c0da2cda3fc Binary files /dev/null and b/codegen_output/devstral_notepad.exe differ diff --git a/codegen_results.txt b/codegen_results.txt new file mode 100644 index 00000000000..6db773c2880 --- /dev/null +++ b/codegen_results.txt @@ -0,0 +1,13 @@ +# Final CodeGen Results +====================== + +C# WinForms Notepad: +- Devstral: PASS (standalone, compiled to exe) +- DeepSeek-Coder-V2: Partial class (incomplete) +- Qwen/Starcoder/Gemma4: Partial class or placeholder + +Python Tkinter Notepad: +- Qwen-2.5-7B: PASS (complete, runnable) + +C++ Win32 Notepad: +- Qwen-2.5-7B: Partial skeleton (needs menu implementation)