Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,14 @@ incremental_db/
*.sof
*.sld
*.jdi

# PYNQ port (`pynq/` subdir) build artefacts
pynq/hw/build/
pynq/hw/sim/cocotb/sim_build/
pynq/hw/sim/cocotb/dump.vcd
pynq/hw/sim/cocotb/results.xml
pynq/hw/sim/cocotb/cocotb_env/
pynq/.Xil/
pynq/NA/
pynq/sw/notebooks/.ipynb_checkpoints/
pynq/.venv/
26 changes: 26 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,29 @@ otherwise noted.
This repository contains third-party reference material, tool-generated files,
and vendor IP with separate ownership and licensing terms. See
THIRD_PARTY_NOTICES.md for details.

----------------------------------------------------------------------

PYNQ-Z2 port (`pynq/` subdirectory)
Copyright 2026 Abdullah Al-Nafisah

The `pynq/` subdirectory adds a port of TALOS-V2 to the Xilinx PYNQ-Z2
(Zynq-7020 XC7Z020CLG400-1). It is self-contained and does not modify
the original Intel DE1-SoC flow at the repository root. Both flows can
coexist; users with both boards can program either from this single
repository.

Contents of `pynq/`:

- Files byte-identical to upstream `rtl/` are redistributed under
Apache-2.0 (copies kept in `pynq/hw/src/core/` and `pynq/hw/ip/`
so the Vivado build is self-contained).
- One file modified from upstream — `pynq/hw/src/core/include/
microgpt_exact_core_rom_init.svh` — paths adjusted for Vivado's
`INCLUDE_DIRS` mechanism. Modifications fall under Apache-2.0 §4.
- New original work — AXI4-Lite wrapper (`pynq/hw/src/top/`),
Vivado batch build (`pynq/hw/tcl/`), cocotb regression suite
(`pynq/hw/sim/cocotb/`), Python PYNQ driver (`pynq/sw/`),
notebooks, demos, tutorials, and built bitstream artefacts —
licensed under BSD 3-Clause; see `pynq/LICENSE.original` and the
per-file attribution in `pynq/UPSTREAM.md`.
34 changes: 34 additions & 0 deletions pynq/LICENSE.original
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
BSD 3-Clause License

Copyright (c) 2026, Abdullah Al-Nafisah

Files in this `pynq/` subdirectory that are NOT byte-identical to upstream
TALOS-V2 (see `UPSTREAM.md` for the per-file attribution) are licensed
under the BSD 3-Clause License below. Files that ARE byte-identical to
upstream remain governed by Apache-2.0 (`../LICENSE`).

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
141 changes: 141 additions & 0 deletions pynq/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# TALOS-V2 — PYNQ-Z2 port

Self-contained Xilinx **PYNQ-Z2** (Zynq-7020 XC7Z020CLG400-1) port of
the [TALOS-V2](https://github.com/Luthiraa/TALOS-V2) "exact" microGPT
SystemVerilog accelerator. The PL keeps the upstream
`microgpt_exact_core` and its sub-blocks **byte-identical**; only the
host bridge, clocking, I/O, and host-side tooling are rewritten for the
Vivado / PYNQ flow. Per-file attribution lives in
[`UPSTREAM.md`](UPSTREAM.md).

The original Intel DE1-SoC flow at the [repository root](..) is
unchanged. Both flows coexist in this repository so users with either
board can build and run microGPT from a single clone.

**Licensing:** files byte-identical to upstream are governed by
Apache-2.0 (`../LICENSE`); original PYNQ-port additions are governed
by BSD 3-Clause (`LICENSE.original`). See `../NOTICE`.

## Directory layout

```
docs/ Design notes + draft upstream license request
demos/ Pre-computed weight heatmaps for the portfolio site
hw/
build/ Vivado project (gitignored output)
constraints/ pynq_z2.xdc (LD0..LD3 only)
ip/ Q12 weight ROMs (.hex) — origin: TALOS-V2
src/
core/ Unmodified TALOS-V2 RTL + .svh includes
top/ microgpt_pynq_top.sv (AXI4-Lite wrapper — new)
sim/cocotb/ cocotb regression suite for the AXI wrapper
tcl/ build.tcl (Vivado batch build — new)
overlays/ microgpt.bit / microgpt.hwh land here
sw/
drivers/ microgpt.py (pynq.MMIO driver — new)
notebooks/ demo.ipynb, hardware_advantage.ipynb, throughput.ipynb
tests/
tutorials/ Three-notebook workflow walkthrough
UPSTREAM.md Per-file attribution (TALOS-V2 vs this fork)
LICENSE_STATUS.md Why this repo is not yet open-source-redistributable
```

## Tutorials

Start with [`tutorials/00_overview.ipynb`](tutorials/00_overview.ipynb)
for the workflow loop, then `01_explore_weights.ipynb` to visualise
the Q12 ROMs, then `02_register_map_and_driver.ipynb` for the
AXI4-Lite layout and driver hot path.

## Quick-start

### 1. Build the bitstream

```bash
# From repo root
vivado -mode batch -source hw/tcl/build.tcl
```

This creates the Vivado project under `hw/build/`, runs synthesis and
implementation, and copies `microgpt.bit` + `microgpt.hwh` into
`overlays/`.

### 2. Deploy to the PYNQ-Z2

```bash
scp overlays/microgpt.bit overlays/microgpt.hwh \
xilinx@<board-ip>:/home/xilinx/pynq/overlays/microgpt/
scp -r sw/drivers sw/notebooks \
xilinx@<board-ip>:/home/xilinx/jupyter_notebooks/microgpt/
```

### 3. Run on the board

Open Jupyter (`http://<board-ip>:9090`) and run
`sw/notebooks/demo.ipynb`, or from a Python shell:

```python
from microgpt import MicroGPT
gpt = MicroGPT()
text, info = gpt.generate(max_tokens=8, temperature=1.0, seed=42)
print(text, info["cycles"])
```

## Register map (AXI4-Lite slave at 0x4000_0000, 4 KB)

| Offset | RW | Field |
|--------:|:--:|:--------------------------------------------------------------|
| 0x000 | RO | Magic = `0x4D475254` ("MGRT") |
| 0x004 | RO | Version = `0x00020001` |
| 0x008 | WO | bit0 = start pulse, bit1 = clear pulse |
| 0x00C | RO | Status `{pos, out_len, 0, 0, direct_mode, toggle, error, done, busy, ready}` |
| 0x010 | RW | Config `{temp_q8_8[31:16], max_gen[15:8], 0[7:0]}` |
| 0x014 | RW | RNG seed |
| 0x018 | RO | `{top_logit_q12[31:16], argmax_token[15:8], last_token[7:0]}` |
| 0x01C | RO | BOS_TOKEN (`26`) |
| 0x020 | RW | Step config `{0, step_token, step_pos, step_clear, direct_mode}` |
| 0x024 | WO | Step trigger pulse (bit0) |
| 0x028 | RO | heartbeat_reg snapshot (debug; zero-padded to 32b) |
| 0x060.. | RO | `output_mem[0..15]` -- 16 generated tokens |
| 0x0D8 | RO | perf_cycles |
| 0x0DC | RO | tokens_per_sec |
| 0x100.. | RO | 27 sign-extended logits (Q12) |

PL LEDs LD0..LD3 expose `{heartbeat, busy, done, error}` (heartbeat moved
to LD0 so it stays visible even on boards where LD3/M14 has a physical
fault, as observed on the deployed PYNQ-Z2 unit).

## Avalon-MM -> AXI4-Lite translation summary

| DE1-SoC (Avalon-MM) | PYNQ-Z2 (AXI4-Lite) |
|----------------------------------------------------|--------------------------------------------------|
| `jtag_microgpt_bridge` master + `waitrequest`/`readdatavalid` handshakes | Standard AXI4-Lite slave on PS GP0 (`s_axi_*`). |
| 50 MHz `CLOCK_50` host domain + 56.25 MHz core PLL | Single domain `s_axi_aclk = FCLK_CLK0 = 50 MHz`. |
| Toggle-bit triggers (`host_start_toggle_50` etc.) crossed via 2-FF synchronizers | 1-cycle `start_pulse` / `clear_pulse` / `step_pulse` decoded inline. |
| `host_toggle_reg` flips on every JTAG read or write | `host_toggle_reg` flips on every successful AXI read or write. |
| WSTRB / byte enables driven by JTAG bridge (4'b1111) | `s_axi_wstrb` accepted but ignored; aligned 32-bit writes only. |
| Resets: `~SW[1] && pll_locked` | `s_axi_aresetn` from `proc_sys_reset` driven by `FCLK_RESET0_N`. |
| Outputs: 10x LEDR + 6x HEX | 4x PL LEDs (LD0..LD3): `heartbeat`, `busy`, `done`, `error`. |
| Weights via `$readmemh("generated/...hex", ...)` | Weights live in `hw/ip/`; build.tcl adds it to `INCLUDE_DIRS` and `rom_init.svh` references bare filenames. |

## Notes

- The unmodified core RTL lives in `hw/src/core/` and includes
`microgpt_exact_core_params.svh`, `microgpt_exact_core_math.svh`,
and `microgpt_exact_core_rom_init.svh`. Parameters (e.g. `EMBED_DIM`,
`VOCAB_SIZE`, `FRAC_BITS`) are unchanged from the DE1 build.
- The build script targets `xc7z010clg400-1`; if your PYNQ-Z2 carries
the larger XC7Z020 die, edit the `part` variable at the top of
`hw/tcl/build.tcl`.
- All RTL uses 4-space indentation, no tabs.


## Run Build:
```bash
mkdir -p hw/build
vivado -mode batch -source hw/tcl/build.tcl \
-log hw/build/vivado_build.log \
-journal hw/build/vivado_build.jou \
2>&1 | tee hw/build/build_console.log
```

45 changes: 45 additions & 0 deletions pynq/UPSTREAM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Upstream attribution

This repository is a **port** of the SystemVerilog inference core from
[`Luthiraa/TALOS-V2`](https://github.com/Luthiraa/TALOS-V2), an RTL
implementation of a Karpathy-style microGPT for the Intel DE1-SoC
(Cyclone V), to the **Xilinx PYNQ-Z2** (Zynq-7020 XC7Z020CLG400-1).

All credit for the inference core RTL and the underlying numerical
design (Q12 fixed-point, systolic matvec tile, processing-element
array, RMS-norm + saturating-divider engines, categorical sampler)
belongs to the upstream author(s) of TALOS-V2.

This fork's contribution is the *host-side bridge to PYNQ*: an
AXI4-Lite slave wrapper, a Vivado batch build, a cocotb regression
suite for the wrapper, and a Python (`pynq.MMIO` + UIO IRQ) driver.

## Per-subtree origin

| Subtree in this repo | Origin | Modifications |
| ----------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `hw/src/core/*.sv` (7 files) | `Luthiraa/TALOS-V2/rtl/src/` | **Unmodified, byte-identical** to upstream. |
| `hw/src/core/include/microgpt_exact_core_math.svh` | `Luthiraa/TALOS-V2/rtl/src/include/` | **Unmodified, byte-identical** to upstream. |
| `hw/src/core/include/microgpt_exact_core_params.svh` | `Luthiraa/TALOS-V2/rtl/src/include/` | **Unmodified, byte-identical** to upstream. |
| `hw/src/core/include/microgpt_exact_core_rom_init.svh` | `Luthiraa/TALOS-V2/rtl/src/include/` | Modified: paths updated for Vivado `INCLUDE_DIRS` and bare-filename `$readmemh` references. |
| `hw/ip/*.hex` (9 weight ROMs) | `Luthiraa/TALOS-V2/rtl/generated/` | Unmodified Q12 fixed-point exports of the upstream-trained microGPT weights. |
| `hw/src/top/microgpt_pynq_top.sv` | **New (this fork)** | AXI4-Lite slave wrapper exposing the upstream core via the Zynq PS GP0. |
| `hw/tcl/build.tcl` | **New (this fork)** | Vivado batch build (Zynq + AXI Interconnect + top + constraints). |
| `hw/sim/cocotb/` | **New (this fork)** | cocotb regression suite targeting `microgpt_pynq_top` (caught a production write-path bug pre-bitstream). |
| `sw/drivers/microgpt.py` | **New (this fork)** | Python MMIO driver, IRQ fast path via `/dev/uio<n>`. |
| `sw/notebooks/*.ipynb` | **New (this fork)** | Demo, hardware-advantage, throughput notebooks for the deployed overlay. |
| `overlays/*.bit`, `*.hwh` | **New (this fork)** | Vivado-built artefacts targeting `xc7z010clg400-1` / `xc7z020clg400-1`. |
| `demos/build.py` | **New (this fork)** | Weight-tensor heatmap renderer for the companion portfolio site. |
| `tutorials/` | **New (this fork)** | Workflow walkthrough notebooks. |

## Conventions adopted

- All upstream files retain their original headers, naming, parameter
values (`EMBED_DIM`, `VOCAB_SIZE`, `FRAC_BITS`, …), and behaviour.
- The cocotb tests target only the **new** AXI wrapper (`hw/src/top/`);
upstream core behaviour is **not** retested here — that responsibility
remains with the upstream ModelSim testbenches.
- No upstream file in `hw/src/core/` should be edited in this repo.
If the upstream core needs a fix, the fix belongs in upstream and
this repo pulls it in via a fresh copy + a noted update in this
file's modification log.
120 changes: 120 additions & 0 deletions pynq/demos/build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
"""
Build precomputed demo assets for the website project page.

Renders each of the 4,192 INT16 (Q12) weight tensors baked into the PL
fabric as a labeled heatmap PNG, plus a metadata JSON consumed by the
website's HeatmapViewer. No inference run — just a faithful visualisation of
what's literally hardcoded into LUTRAM/BRAM at synthesis time.

Refresh:

python3 demos/build.py
cp demos/out/* ../AbdullahAlNafisah.github.io/public/demos/pynq-microgpt/

Pure numpy + matplotlib. Q12 sign convention: each value is a 16-bit
two's-complement int read as Q12 fixed-point (one sign bit, 3 integer
bits, 12 fractional bits → range [-8, 8)).
"""

from __future__ import annotations

import json
from pathlib import Path

import matplotlib

matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np

REPO_ROOT = Path(__file__).resolve().parent.parent
IP_DIR = REPO_ROOT / "hw" / "ip"
OUT = Path(__file__).parent / "out"
OUT.mkdir(parents=True, exist_ok=True)

FRAC_BITS = 12 # matches hw/src/core/include/microgpt_exact_core_params.svh
EMBED_DIM = 16
VOCAB_SIZE = 27 # 26 letters + BOS sentinel
MLP_DIM = 64
SEQ_LEN = 16 # wpe_q12.hex has 16*16 entries


def load_q12_hex(path: Path, shape: tuple[int, int]) -> np.ndarray:
"""Read a .hex file (one 16-bit hex word per line) into a float Q12 matrix."""
raw = np.array(
[int(line.strip(), 16) for line in path.read_text().splitlines() if line.strip()],
dtype=np.uint16,
)
signed = raw.astype(np.int32)
signed[signed >= 0x8000] -= 0x10000 # two's complement → signed int16 in int32 storage
fp = signed.astype(np.float64) / (1 << FRAC_BITS)
assert fp.size == shape[0] * shape[1], f"{path.name}: expected {shape}, got {fp.size} values"
return fp.reshape(shape)


# (filename, label, shape, blurb)
WEIGHTS: list[tuple[str, str, tuple[int, int], str]] = [
("wte_q12.hex", "WTE — token embedding", (VOCAB_SIZE, EMBED_DIM), "27 tokens × 16-dim embedding"),
("wpe_q12.hex", "WPE — positional embedding", (SEQ_LEN, EMBED_DIM), "16 positions × 16-dim embedding"),
("layer0_attn_wq_q12.hex", "W_Q — attention query", (EMBED_DIM, EMBED_DIM), "16 × 16"),
("layer0_attn_wk_q12.hex", "W_K — attention key", (EMBED_DIM, EMBED_DIM), "16 × 16"),
("layer0_attn_wv_q12.hex", "W_V — attention value", (EMBED_DIM, EMBED_DIM), "16 × 16"),
("layer0_attn_wo_q12.hex", "W_O — attention output", (EMBED_DIM, EMBED_DIM), "16 × 16"),
("layer0_mlp_fc1_q12.hex", "FC1 — MLP up-projection", (EMBED_DIM, MLP_DIM), "16 × 64"),
("layer0_mlp_fc2_q12.hex", "FC2 — MLP down-projection", (MLP_DIM, EMBED_DIM), "64 × 16"),
("lm_head_q12.hex", "LM head — logits projection", (EMBED_DIM, VOCAB_SIZE), "16 × 27"),
]


def render_heatmap(arr: np.ndarray, label: str, out_path: Path) -> None:
"""Render a symmetric-around-zero heatmap. PNG output is dimensionless."""
vmax = float(np.max(np.abs(arr))) or 1e-9
fig, ax = plt.subplots(figsize=(4.0, 4.0 * arr.shape[0] / max(arr.shape[1], 1)), dpi=200)
ax.imshow(arr, cmap="RdBu_r", vmin=-vmax, vmax=vmax, interpolation="nearest", aspect="auto")
ax.set_axis_off()
fig.subplots_adjust(left=0.02, right=0.98, top=0.98, bottom=0.02)
fig.savefig(out_path, bbox_inches="tight", pad_inches=0)
plt.close(fig)


def main() -> None:
presets = []
total_params = 0
for fname, label, shape, blurb in WEIGHTS:
path = IP_DIR / fname
w = load_q12_hex(path, shape)
total_params += w.size
png_name = fname.replace("_q12.hex", ".png")
png_path = OUT / png_name
render_heatmap(w, label, png_path)
presets.append({
"name": label,
"image_url": f"/demos/pynq-microgpt/{png_name}",
"subtitle": (
f"{blurb} · {w.size} INT16 (Q12) values · "
f"|w| ≤ {float(np.max(np.abs(w))):.2f} · σ = {float(np.std(w)):.3f}"
),
})

meta = {
"kind": "heatmap",
"title": (
f"microgpt fabric weights · char-level GPT · {EMBED_DIM}-dim · 1 block · "
f"{total_params} INT16 (Q12) params total"
),
"image_label": "weight matrix (red = positive · blue = negative · scaled per matrix)",
"presets": presets,
"caption": (
"Each preset renders one of the nine weight tensors literally "
"hardcoded into PL fabric (LUTRAM / BRAM / constants) by the "
"synthesizer — no DRAM, no DMA. Values read straight from "
"hw/ip/*.hex and interpreted as 16-bit Q12 fixed-point. "
"Generated by demos/build.py."
),
}
(OUT / "weights.json").write_text(json.dumps(meta, indent=2))
print(f"Wrote {len(presets)} presets ({total_params} params) to {OUT}")


if __name__ == "__main__":
main()
Binary file added pynq/demos/out/layer0_attn_wk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pynq/demos/out/layer0_attn_wo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pynq/demos/out/layer0_attn_wq.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pynq/demos/out/layer0_attn_wv.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pynq/demos/out/layer0_mlp_fc1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pynq/demos/out/layer0_mlp_fc2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pynq/demos/out/lm_head.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading