Before beginning the labs, you must first set up the required toolchain. This process may present some challenges, so please refer to the debugging notes in debug.md if you encounter any issues.
This lab uses the PYNQ-Z1 board and requires Windows or Linux for development. Please ensure you have Vivado 2020.2 installed, as newer versions may have IP template changes that could affect the lab exercises.
Note: This lab was originally developed on Windows 10 using PYNQ v2.7 and Xilinx Vivado 2020.2 (Kevin), and has been tested on Windows 11 using PYNQ v3.1 and Xilinx Vivado 2020.2 (Cheng).
Important: Vivado projects and TCL scripts are forward-compatible but not backward-compatible.
Source: https://pynq.readthedocs.io/en/latest/pynq_sd_card.html
If you encounter any issues, please first consult the debugging notes in debug.md before contacting the TAs or module leader.
Operating System Requirements: Windows 10/11 or Linux (Ubuntu)
- MacOS users: Please use one of the following options:
- Virtual machine (VM)
- Lab computers
Download Vivado from the Xilinx download page. If you have limited storage space, refer to debug.md for guidance.
Download the PYNQ SD Card Image and write it to your SD card using Raspberry Pi Imager or a similar tool.
📝 Watch the video: Setting up your PYNQ-Z1 board before proceeding.
After flashing the PYNQ image onto your SD card, insert it into the PYNQ-Z1 board and connect power. The board can be powered via either a micro-USB cable or a 5V power supply. Ensure the power jumper is set to the appropriate position (USB or external power).
Let's examine how the board works. The central component is the Zynq-7000 System-on-Chip (SoC), which consists of:
- Processing System (PS): A dual-core ARM Cortex-A9 processor
- Programmable Logic (PL): FPGA fabric that can be configured with custom hardware designs
The block diagram for the Zynq-7000 SoC is shown below:
Source: https://www.mouser.co.uk/new/xilinx/xilinx-zynq-7000-socs/
The PYNQ framework provides a complete Ubuntu-based Linux distribution on the SD card, including Linux drivers for the PS-PL interfaces wrapped in Python libraries for easier development. The following slides from Xilinx's PYNQ introduction provide an excellent visual representation of the system architecture:
![]() |
|---|
![]() |
![]() |
Source: https://github.com/Xilinx/PYNQ_Workshop/blob/master/01_PYNQ_Workshop_introduction.pdf
PYNQ uses a web-based Jupyter Notebook interface for interacting with the FPGA board. Connect an Ethernet cable between the board and your computer. If your computer lacks an Ethernet port, use an Ethernet adapter.
Note: When connecting directly to your computer via Ethernet, PYNQ will not have internet access unless you bridge your computer's internet connection. Without internet access, you cannot update system packages.
PYNQ uses the static IP address 192.168.2.99 by default. Configure your computer to use an IP address on the same subnet (i.e., 192.168.2.X) to access the Jupyter Notebook server.
Windows:
- Open
Network and Sharing Center - Click on the
Ethernet connection - Click
Properties - Double-click
Internet Protocol Version 4 (TCP/IPv4) - Assign a static IP address:
192.168.2.X(where X is any number from 1-254, except 99)
Linux: Refer to the PYNQ documentation: https://pynq.readthedocs.io/en/latest/appendix/assign_a_static_ip.html#assign-a-static-ip-address
Open your web browser and navigate to: 192.168.2.99:9090. Enter the default password: xilinx
Connection Issues? If your PYNQ board refuses to connect, refer to the troubleshooting section in debug.md. You may need to use SSH or a serial terminal for debugging.
Terminals are useful tools for controlling remote, low-powered devices with limited I/O hardware. While not strictly necessary (you can interact with PYNQ through the Jupyter Notebook interface), having terminal access is valuable for troubleshooting connection issues.
The debug.md guide explains how to access a terminal via SSH or serial console.
You are now ready to create your first FPGA design. This section will guide you through implementing a Finite Impulse Response (FIR) filter.
Credits: This section is based on Jeff Johnson's tutorial. If you need additional guidance, refer to his YouTube tutorial. Thanks also to Sanjit Raman for providing screenshots. Note that some screenshots show Vivado 2024.2 on Ubuntu; your version may differ slightly, but the steps remain the same.
-
Open Vivado and select Create Project
-
Click Next to proceed through the wizard
-
Enter a project name (e.g.,
lab1) and location. Ensure Create project subdirectory is checked -
Click Next
-
Select RTL Project and choose Do not specify sources at this time
-
Click Next
-
Select the PYNQ-Z1 board from the Boards tab. Double-click the part number to select it
📝 Can't find PYNQ-Z1? Switch to the
Partstab and selectxc7z020clg400-1directly. -
Click Finish
Board Part Troubleshooting: If you cannot find the PYNQ-Z1 board, refer to debug.md. Alternatively, use the
Partstab to directly select the part numberxc7z020clg400-1.
In the left sidebar under IP INTEGRATOR, click Create Block Design. You can use the default name design_1.
Add the ZYNQ7 Processing System IP to your block design. This component provides the interface to the dual ARM Cortex-A9 cores. Double-click the ZYNQ7 PS block to open its configuration. Note the section for HP (High Performance) slave ports—you'll need one port (HP0) for this design.
Add the AXI Direct Memory Access (DMA) IP to your block design.
Double-click the AXI DMA block to configure it:
- Disable
Enable Scatter Gather Engine - Set
Width of Buffer Length Registerto 26 bits (maximum) - Click OK to save
Add the FIR Compiler IP to design your filter.
Double-click the FIR Compiler block to configure it. In the Filter Options tab, paste the following coefficients:
-255, -260, -312, -288, -144, 153, 616, 1233, 1963, 2739, 3474, 4081, 4481, 4620, 4481, 4081, 3474, 2739, 1963, 1233, 616, 153, -144, -288, -312, -260, -255
In the Channel Specification tab:
- Set
Input Sampling Frequencyto 100 MHz - Set
Clock Frequencyto 100 MHz - This ensures each clock cycle processes one filter input
In the Implementation tab:
- Set
Input Data Widthto 32 bits - Set
Output Rounding Modeto Non Symmetric Rounding Up - Set
Output Widthto 32 bits
In the Interface tab:
- Enable Output TREADY
- Enable TLAST via
Packet Framing - (This configures the AXI Stream protocol communication)
Click OK to save the configuration.
Connect the IP blocks:
-
Connect
M_AXIS_DATA(FIR Compiler output) →S_AXIS_S2MM(AXI DMA input)- This sends processed data from the filter to memory via DMA
-
Connect
M_AXIS_MM2S(AXI DMA output) →S_AXIS_DATA(FIR Compiler input)- This feeds memory-mapped data from DMA into the filter's streaming interface
Your block diagram should look like this:

Now, we connect this up to the ZYNQ Processing System, so that the DMA can access the DDR Memory that is present in the PS.
Double click the ZYNQ7 Processing System to edit it, and double click on the High Performance AXI Slave Ports to edit them. Enable one port, for example the HP0 port. Then save and exit the customization.
Next, Run Block Automation.
Also, Run Connection Automation - Vivado intelligently maps input ports and output ports together. Select all the ports in the tree view.
Press F6 to validate your design. You will see incomplete address path warnings. Run Connection Automation again.
Rename the FIR Compiler block to fir, and the AXI DMA block to fir_dma. This will make it cleaner to access in the Jupyter Notebook when we are utilising these accelerators.
You should have a design that looks something like this:
Now that the design is completed, click F6 to validate your design. If validation is successful, double click on design1.bd under "Design Sources" in the "Sources" window. Then select "Create HDL wrapper". Once that is completed, Go to the sidebar on the left, and run "Generate Bitstream". This should automatically run Synthesis and Implementation.
Synthesis translates your HDL code into a gate-level netlist of logical components (LUTs, flip-flops, DSPs, etc.) that can be implemented on the FPGA fabric. Implementation then places those components onto physical FPGA resources and routes the connections between them, while bitstream generation creates the binary configuration file that programs the FPGA.
Now to run your design on the PYNQ board, we need three files: a tcl file, a hwh files, and a bit file.
Export the required files:
- TCL file: Go to File → Export → Export Block Design
- HWH file: Navigate to
<project>/lab1.gen/sources_1/bd/design_1/hw_handoff/design_1.hwh - BIT file: Navigate to
<project>/lab1.runs/impl_1/design_1_wrapper.bitand rename it todesign_1.bit
(Replace <project> with your actual project directory)
- Ensure your laptop is connected to the PYNQ board via Ethernet
- Open a browser and navigate to
192.168.2.99:9090 - Enter the password:
xilinx - Create a new folder (e.g.,
InfoProc-lab1) - Upload the three files (
.tcl,.hwh,.bit) to this folder - Upload the provided Jupyter Notebook from
jupyter_notebook/lab1/fir.ipynb
- Open the Jupyter Notebook and execute the cells
- Follow the instructions and observe the performance difference between hardware and software implementations of the FIR filter
This section demonstrates how Memory-Mapped I/O (MMIO) and register control work by implementing a hardware array merger that performs:
[1,3,5] + [2,4,6] => [1,2,3,4,5,6]
Reference: For a similar design, watch Make an RTL-based IP work with PYNQ - AXI Lite adder
Create a new Vivado project named merge_array following the same steps as Section 1.2.
Create a block design and add the ZYNQ7 Processing System.
Select Tools → Create and Package New IP from the menu bar.
- Choose Create a new AXI4 peripheral
- Name the peripheral
merge_arraywith version1.0
- In the Add Interfaces page, set Number of Registers to 5
- Select Edit IP and click Finish. Vivado will open a new project window for editing the IP.
You will now customize the auto-generated AXI4-Lite interface template.
Open the Sources window and locate merge_array_v1_0_S00_AXI.v.
Add New Signal Declarations
Locate the line reg aw_en; and add the following signals immediately after it:
reg aw_en;
// Add these new signals:
wire fsmStart;
wire sortDone;
reg fifo1_wr_en;
reg fifo2_wr_en;
wire [31:0] mergedFifoRdData;Add Control Signal Assignment
Locate the line assign S_AXI_RVALID = axi_rvalid; and add the following immediately after:
assign S_AXI_RVALID = axi_rvalid;
// Add this:
assign fsmStart = slv_reg0[0];Replace Register Write Logic
Locate the large always @(posedge S_AXI_ACLK) block that handles register writes:
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
slv_reg0 <= 0;
slv_reg1 <= 0;
slv_reg2 <= 0;
slv_reg3 <= 0;
slv_reg4 <= 0;
end
else begin
if (slv_reg_wren)
begin
case ( axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] )
// ... all the case statements ...
endcase
end
end
endDelete this entire block and replace it with:
// slv_reg0 handling
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
slv_reg0 <= 0;
end
else begin
if (sortDone)
slv_reg0 <= 0;
if (slv_reg_wren && axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] == 0)
slv_reg0 <= S_AXI_WDATA;
end
end
// slv_reg1 handling
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
slv_reg1 <= 0;
end
else begin
if (sortDone)
slv_reg1 <= 1;
if (slv_reg_wren && axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] == 1)
slv_reg1 <= S_AXI_WDATA;
end
end
// fifo1_wr_en handling (replaces slv_reg3)
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
fifo1_wr_en <= 0;
end
else begin
if (sortDone)
slv_reg0 <= 0;
if (slv_reg_wren && axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] == 3)
fifo1_wr_en <= 1'b1;
else
fifo1_wr_en <= 1'b0;
end
end
// fifo2_wr_en handling (replaces slv_reg4)
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
fifo2_wr_en <= 0;
end
else begin
if (sortDone)
slv_reg0 <= 0;
if (slv_reg_wren && axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] == 4)
fifo2_wr_en <= 1'b1;
else
fifo2_wr_en <= 1'b0;
end
endUpdate Register Read Logic
Locate the always @(*) block for address decoding:
always @(*)
begin
// Address decoding for reading registers
case ( axi_araddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] )
3'h0 : reg_data_out <= slv_reg0;
3'h1 : reg_data_out <= slv_reg1;
3'h2 : reg_data_out <= slv_reg2;
3'h3 : reg_data_out <= slv_reg3;
3'h4 : reg_data_out <= slv_reg4;
default : reg_data_out <= 0;
endcase
endReplace with:
always @(*)
begin
// Address decoding for reading registers
case ( axi_araddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] )
3'h0 : reg_data_out <= slv_reg0;
3'h1 : reg_data_out <= slv_reg1;
3'h2 : reg_data_out <= mergedFifoRdData;
default : reg_data_out <= 0;
endcase
end
assign mergedFifoRdEn = slv_reg_rden && (axi_araddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] == 2);Instantiate the Merge Core
Find the "User logic" section near the end of the file. Replace:
// Add user logic here
// User logic endsWith:
// Add user logic here
mergeCore mc(
.clock(S_AXI_ACLK),
.reset(!S_AXI_ARESETN),
.start(fsmStart),
.fifoWrData(S_AXI_WDATA),
.fifo1WrEn(fifo1_wr_en),
.fifo2WrEn(fifo2_wr_en),
.mergedFifoRdEn(mergedFifoRdEn),
.mergedFifoRdData(mergedFifoRdData),
.done(sortDone)
);
// User logic ends| Register Address | Modified Function |
|---|---|
| 0x00 (slv_reg0) | Start bit + auto-clears when done |
| 0x04 (slv_reg1) | Status register (set to 1 when done) |
| 0x08 (slv_reg2) | READ: Merged FIFO output |
| 0x0C (slv_reg3) | WRITE: FIFO1 input (generates pulse) |
| 0x10 (slv_reg4) | WRITE: FIFO2 input (generates pulse) |
The transformation converts a passive register file into an active hardware controller with proper handshaking!
You have now transformed a standard AXI4 peripheral template into a custom hardware controller with proper register control and handshaking logic.
The mergeCore module instantiated in the user logic section needs to be implemented.
- In the Sources panel, click the + button
- Select Add or create design sources
- Click Add Files and navigate to
hw_files/mergeCore.v - Click OK to add the file
You'll notice that mergeCore.v references FIFO modules that don't exist yet (indicated by red question marks).
- Click IP Catalog in the Project Manager section
- Search for FIFO Generator
- Double-click to open the configuration wizard
Configure the FIFOs:
- Keep Interface Type: Native
- Keep Clocking Mode: Common Clock Block RAM
- Navigate to the Native Ports tab for the following changes:
Create two FIFO configurations:
-
arrayFifo (for input arrays):
- Component Name:
arrayFifo - Write/Read Depth: 1024
- Data Width: 32 bits
- Component Name:
-
mergedFifo (for output):
- Component Name:
mergedFifo - Write/Read Depth: 2048
- Data Width: 32 bits
- Component Name:
Important: The component names must exactly match the module instantiations in
mergeCore.v.
After creating both FIFOs:
- Navigate to Package IP → Edit Packaged IP
- Review the Identification tab to note the IP version
- Select the File Groups tab. You should see 0 files included for the "Standard > Synthesis" file group.
Click on "Merge changes from File Groups Wizard". You should now see the hardware source files included.
If the above option does not show up, an alternate method is to add files manually. You can do so by right-clicking "Synthesis" file group, then selecting "Add Files..."
Hints:
- The file orders matter here.
- When adding these files manually, sometimes you have to manually set the Model Name entry for Vivado to identify the top-level file.
- Next, under Review and Package, click Re-Package IP
- Close the IP project and return to your main
merge_arrayproject
- In the block diagram canvas, click the + button
- Search for and add
merge_array_v1_0
- Run Connection Automation to connect the IP to the ZYNQ PS
- Right-click
design_1and select Create HDL Wrapper - Choose Let Vivado manage wrapper and auto-update
-
Click Generate Bitstream and wait for completion
-
Export the three required files:
- TCL: File → Export → Export Block Design
- HWH:
merge_array/merge_array.gen/sources_1/bd/design_1/hw_handoff/design_1.hwh - BIT:
merge_array/merge_array.runs/impl_1/design_1_wrapper.bit(rename todesign_1.bit)
- Upload all three files to a folder on your PYNQ Jupyter Notebook interface
With the hardware complete, you need software drivers to control it from Python.
Pre-written drivers are provided in the drivers/merge_driver folder. This section explains how they work and how to install them.
Transfer the driver files to your PYNQ board using either SSH (PuTTY) or the Jupyter Notebook terminal.
File Structure:
/home/xilinx/pynq/lib/
├── merge.py
└── _pynq/
└── _merge/
├── merge_driver.cpp
├── merge_driver.h
└── Makefile
Build the Shared Library:
-
Navigate to the driver directory:
cd /home/xilinx/pynq/lib/_pynq/_merge -
Compile the C++ code:
make
-
Copy the compiled library:
cp libmerge.so ../../
The driver creates a bridge between Python and your custom FPGA hardware, enabling register control and data transfer between the PS (ARM processor) and PL (FPGA fabric).
Header File (merge_driver.h):
Defines memory-mapped register offsets for the merge IP core. These offsets are accessed as BaseAddr + REGISTER_OFFSET.
Source File (merge_driver.cpp):
Contains merge_read() and merge_write() functions that access hardware registers:
*(volatile uint32_t *)addr = data;The volatile keyword ensures the compiler doesn't optimize away register accesses.
The merge() Function:
- Writes each element of arrays
aandbtoMERGE_1_REGandMERGE_2_REG - Triggers the merge operation by writing
0x1toMERGE_CTRL_REG - Polls
MERGE_STATUS_REGuntil completion (reads 0) - Reads merged results from
MERGE_RESULT_REGintoBufAddr
The extern "C" linkage prevents C++ name mangling, enabling CFFI to call the function.
How Python Controls C++ Functions:
The Python driver uses CFFI (C Foreign Function Interface) to call compiled C++ code:
self._libmerge = self._ffi.dlopen(os.path.join(LIB_SEARCH_PATH, "libmerge.so"))CFFI handles type conversions automatically, providing a clean interface between Python and C/C++.
Why PYNQ? Traditional FPGA development requires writing low-level C drivers. PYNQ simplifies this by providing pre-written drivers wrapped in Python, making FPGA development accessible to software engineers.
Function Declarations:
The cdef() calls declare C function signatures:
self._ffi.cdef("void merge(unsigned int BaseAddr, ...);")MergeIP Class:
Inherits from DefaultIP, integrating with PYNQ's overlay system:
- **`bindto`**: Specifies which Vivado IP block this driver controls (matches the VLNV identifier)
- **`self.mmio.array`**: Provides memory-mapped access to IP registers
- **`self.buffer`**: NumPy array for storing the merged output
**Data Flow: Python Arrays to C++ Pointers**
When you pass Python lists to the driver, they are converted to C++ pointers through CFFI. Here's the complete flow:
1. **Python Function Call:**
```python
result = merge_ip.merge([1, 3, 5], [2, 4, 6])
-
Convert to NumPy Arrays:
a = numpy.array(a, dtype=numpy.uint32) # [1, 3, 5] b = numpy.array(b, dtype=numpy.uint32) # [2, 4, 6]
-
Get Memory Addresses: NumPy stores data in contiguous memory. The
ctypes.dataattribute returns the memory address:a.ctypes.data # Memory address as integer b.ctypes.data # Memory address as integer
-
Cast to C Pointers: CFFI converts Python integers to C-style pointers:
a_ptr = self._ffi.cast("unsigned int *", a.ctypes.data) b_ptr = self._ffi.cast("unsigned int *", b.ctypes.data)
-
Call C++ Function:
self._libmerge.merge( self._base_addr, # BaseAddr c_buf, # BufAddr a_ptr, # unsigned int *a a_size, # a_size b_ptr, # unsigned int *b b_size # b_size )
Zero-Copy Access: C++ directly accesses the NumPy array memory—no data copying occurs. Both Python and C++ work with the same memory buffer, maximizing efficiency.
- Upload the provided Jupyter Notebook from
jupyter_notebook/lab1/Merge.ipynbto your PYNQ board - Ensure the three hardware files (
.bit,.hwh,.tcl) are in the same directory - Open the notebook and execute all cells
- Verify that the merged array is correctly printed
The hardware accelerator should successfully merge the two input arrays, demonstrating the complete hardware-software integration using MMIO and custom IP cores.
Sometimes, if your ip name is mimatching, you would not see overlay.ip_name being recognized as the correct pynq.lib, as in this case you likely see the type being DefaultIP.
print(type(overlay.merge_v1_0_0))
# Should show <class 'pynq.lib.merge.MergeIP'>refer to debug.md for guidance.










































