Skip to content

aukhalid/evpix_rv32

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EVPIX-RV32: 5-Stage Custom RISC-V SoC with Integrated IPU and TinyML Support for Real-Time Edge-Vision AI Acceleration

CUET Logo

A custom 5-Stage edge-vision processor combining RISC-V programmability with hardware-accelerated Image Processing Unit and TinyML Inference. Verified it with BIST Testbenches, prototyped in real hardware, on Digilent Basys-3 AMD Artix-7 FPGA and finally full RTL-to-GDSII SkyWater 130-nm CMOS ASIC was implemented.


📋 Table of Contents


🔭 Overview

EVPIX-RV32 is a custom 5-stage pipelined RISC-V RV32I processor with hardware image-processing extensions, a streaming Image Processing Unit (IPU), and TinyML support built for real-time edge-vision AI acceleration. The architecture extends the RISC-V ISA with custom instructions for grayscale conversion, thresholding, Sobel edge detection, and 2D convolution in the custom-0 opcode space, connecting to an autonomous IPU that processes 128×128 frames at pixel-level parallelism.

The system was prototyped on the Digilent Basys-3 AMD Artix-7 FPGA with an OV7670 camera and dual-region VGA display, sustaining 60 FPS with zero frame drops. A hardware Built-In Self-Test (BIST) mode runs 61 instructions and shows pass/fail results on the VGA monitor, while a TinyML finger-counting demo validates lightweight neural inference on the platform.

The design was synthesized through the open-source OpenROAD flow targeting SkyWater 130-nm CMOS, producing a DRC-clean, LVS-equivalent GDSII layout across 0.57 mm², operating at 62 MHz with 3.76 mW total power.

Thesis: Bachelor of Science (B.Sc.)
Department: Electronics and Telecommunication Engineering
Institution: Chittagong University of Engineering & Technology (CUET)
Author: Ahasan Ullah Khalid (2008051)
Supervisor: Md. Farhad Hossain, Assistant Professor, Department of ETE


🏗️ System Architecture

Top-Level Architecture

Top-Level Architecture

The EVPIX-RV32 system is a heterogeneous vision SoC that combines a general-purpose 32-bit RISC-V processor with dedicated image-processing hardware. The architecture uses a Harvard-style memory organization with separate instruction and data memories, plus a multi-port data memory that supports concurrent CPU and IPU access.

System Components

Component Description
RV32I Core 5-stage pipelined processor with full hazard detection and forwarding
Image Processing Unit (IPU) Dedicated hardware accelerator for 7 image-processing operations
Memory Subsystem 64KB unified data memory (dual-port BRAM) + Instruction ROM
Camera Interface OV7670 parallel DVP with SCCB configuration (128×128 @ 30 FPS)
Display Interface VGA controller (640×480 @ 60Hz) with dual-buffered frame display
TinyML Accelerator Hardware feature extractor + classifier for finger-counting
Control/Status I/O 16 slide switches, 16 LEDs for mode selection and status

Detailed System Block Diagram

Detailed Architecture

The system interconnect (AXI Bus) enables communication between the RV32I core, IPU, TinyML accelerator, unified data memory, and peripheral interfaces. The camera module feeds raw pixel data through the OV7670 DVP interface, while the VGA display controller outputs processed frames with real-time performance overlays.


✨ Key Features

  • Complete RV32I Base ISA — All 40+ integer instructions with full forwarding and hazard detection
  • 8 Custom Image-Processing Instructions — GRAYSCALE, THRESH, SOBEL, CONV, VDOT, RELU, HACC, OTSU
  • 7 Hardware-Accelerated IPU Operations — Grayscale, Threshold, Sobel Edge, 2D Convolution, Max Pool, Avg Pool, Max Pixel
  • Real-Time 60 FPS Processing — Zero frame drops at 128×128 resolution
  • Hardware BIST Mode — 61-instruction regression with VGA pass/fail display
  • TinyML Finger Counting — Real-time gesture recognition (0-5 fingers)
  • Open-Source ASIC Flow — Complete RTL-to-GDSII using OpenROAD + SkyWater 130nm
  • Low Power — 3.24 mW total power at 100 MHz in 130nm CMOS

🔧 Hardware Architecture

RV32I 5-Stage Pipeline

5-Stage Pipeline

The processor implements a classic five-stage RISC pipeline with full hazard detection and data forwarding:

Stage Function Key Components
IF — Instruction Fetch Reads next instruction from memory Program Counter, Instruction Memory, PC Incrementer
ID — Instruction Decode Decodes instruction, reads registers Register File (32×32-bit), Immediate Generator, Control Unit
EX — Execute Performs ALU operations, branch decisions ALU, Branch Comparator, Forwarding MUXes, IPU Interface
MEM — Memory Access Reads/writes data memory Data Memory, Load/Store Logic, Byte/Halfword/Word access
WB — Write Back Writes results to register file Result MUX (ALU/Mem/PC+4), Register File Write Port

Pipeline Timing Diagram

Pipeline Timing

The pipeline achieves near-ideal IPC for sequential code with single-cycle branch resolution and full forwarding paths.

Individual Stage Diagrams

IF Stage
Instruction Fetch (IF) Stage

ID Stage
Instruction Decode (ID) Stage

EX Stage
Execute (EX) Stage

MEM Stage
Memory Access (MEM) Stage

WB Stage
Write Back (WB) Stage

Image Processing Unit (IPU)

IPU Architecture

The IPU is a dedicated hardware accelerator controlled through custom R-type instructions. It features:

  • FSM Controller — 16 states from idle to finish
  • 3×3 Window Registers — win0-win8 pixel buffer for convolution operations
  • Kernel Coefficient ROM — 6 built-in kernels (Identity, Sobel X, Sobel Y, Gaussian Blur, Sharpen, Edge Detect)
  • Sobel Gradient Unit — Computes Gx, Gy, and gradient magnitude
  • Pooling Unit — 2×2 max/average pooling
  • Pixel Write/ALU Logic — Gray, thresh, conv, sobel, pool selection
  • Memory Interface — Direct dual-port BRAM access

Supported IPU Operations

Operation Description Performance (128×128 @ 100MHz)
Grayscale RGB to luminance: Y = (77R + 150G + 29B) >> 8 1,525 FPS
Threshold Binary threshold at configurable level 1,525 FPS
Max Pixel Finds maximum pixel value in image 1,525 FPS
Sobel Edge 3×3 gradient magnitude computation 1,525 FPS
2D Convolution Programmable kernel convolution 1,220 FPS
Max Pool Non-overlapping 2×2 max pooling 1,220 FPS
Avg Pool Non-overlapping 2×2 average pooling 4,882 FPS

Memory Subsystem

The memory map is organized as follows:

Address Range Size Content
0x0000_00000x0000_BFFF 48 KB 128×128 RGB888 Source Image Buffer
0x0000_C0000x0000_FFFF 16 KB 128×128 8-bit Processed Output Buffer

The 64KB unified data memory uses dual-port BRAM supporting concurrent CPU load/store and IPU direct memory access.

Camera & Display Interfaces

OV7670 Camera Module

OV7670 Camera

The OV7670 CMOS camera module connects via parallel DVP interface with:

  • 8-bit pixel data bus
  • PCLK (pixel clock), HREF (horizontal sync), VSYNC (vertical sync)
  • SCCB (I2C-compatible) configuration bus
  • 128×128 resolution at 30 FPS native capture rate

VGA Display Interface

VGA Interface

The Basys-3 VGA interface provides:

  • 12-bit RGB (4-bit per channel) via resistor-DAC network
  • 640×480 @ 60Hz standard timing
  • Dual-region display: original frame (left) + processed frame (right)
  • On-screen performance overlays (FPS, cycle counts, mode status)

TinyML Accelerator

TinyML Architecture

The TinyML subsystem includes:

  • Hardware Feature Extractor — Skin-color detection and finger-region segmentation
  • Classifier — Quantized neural network for finger-counting (0-5 classes)
  • Temporal Stability Filter — Reduces jitter between frames
  • Integration — Results overlaid on VGA display in real-time

📝 Custom ISA Extensions

The processor extends RV32I with 8 custom instructions encoded in the custom-0 opcode space (0001011):

Instruction Opcode funct3 funct7 Description
GRAYSCALE 0001011 000 kernel[3:0], op=0 Convert RGB to grayscale
THRESH 0001011 000 kernel[3:0], op=1 Apply binary threshold
SOBEL 0001011 000 kernel[3:0], op=2 Sobel edge detection
CONV 0001011 000 kernel[3:0], op=3 2D convolution with kernel
VDOT 0001011 001 Vector dot product for ML
RELU 0001011 010 Rectified linear activation
HACC 0001011 011 Histogram accumulation
OTSU 0001011 100 Otsu threshold calculation

The funct3 field selects the IPU operation type (START, STATUS, RESULT, PERF), while funct7 selects the algorithm and kernel. This encoding maintains full compatibility with standard RV32I tools and compilers.


🖥️ FPGA Implementation

Basys-3 Board Setup

Basys-3 Board

The Digilent Basys-3 development board features:

  • FPGA: Xilinx Artix-7 XC7A35T-1CPG236C
    • 33,280 LUTs | 66,400 FFs | 90 BRAMs (1,800 Kb) | 90 DSP48E1 slices
  • Clock: 100 MHz onboard oscillator
  • I/O: 16 slide switches, 16 LEDs, 5 pushbuttons, 4-digit 7-segment display
  • Display: VGA port (12-bit RGB)
  • Expansion: 4 Pmod connectors
  • Programming: USB-JTAG via shared UART/JTAG port

Board Component Layout

Basys-3 Callouts

Callout Component Use in EVPIX-RV32
1 Power Good LED Power status indicator
2 Pmod Ports OV7670 camera connection
3 Analog Pmod (XADC)
4 7-Segment Display Performance counters
5 Slide Switches (16) Mode selection (CPU/IPU/BIST/TinyML)
6 LEDs (16) Status indicators
7 Pushbuttons (5) Reset, user input
8 FPGA Programming Done LED Configuration status
9 FPGA Configuration Reset Hardware reset
10 Programming Mode Jumper JTAG/SPI selection
11 Shared UART/JTAG USB Programming and debug
12 VGA Connector Monitor output
13 Shared UART/JTAG USB Alternative programming
14 External Power Connector
15 Power Switch Board power
16 Power Select Jumper USB/External power

Physical Prototype Setup

FPGA Prototype

The physical prototype connects:

  • OV7670 camera → Pmod-compatible breakout → Basys-3 Pmod port
  • VGA monitor → Basys-3 VGA port via DB15 cable
  • USB power → Basys-3 micro-USB for power and programming

Vivado Design Flow

Vivado Design Flow

The FPGA implementation follows the standard Xilinx Vivado flow:

1. New RTL Project → Target: xc7a35tcpg236-1
2. Design Entry → Add SystemVerilog (*.sv) sources + XDC constraints
3. RTL Analysis → Elaborate design, check for issues
4. Synthesis → Area optimization, default strategy
5. Implementation → Placement & Routing, default settings
6. Bitstream Generation → Compression enabled
7. Hardware Programming → FPGA via JTAG (Hardware Manager)

FPGA Resource Utilization

FPGA Utilization

Resource Used Available Utilization
Slice LUTs 16,547 20,800 79.55%
Slice Registers 5,534 41,600 13.30%
Block RAM Tile 30 50 60.00%
DSP Slices 0 90 0.00%
Clock Buffers (BUFG) 2 32 6.25%
Bonded IOB 62 106 58.49%

Note: All arithmetic is LUT-based (no DSP slices) for maximum portability across FPGA families and clean ASIC synthesis.

FPGA Power Consumption

FPGA Power

Metric Value
Total On-Chip Power 216.0 mW
Dynamic Power 142.0 mW (66%)
Device Static Power 73.0 mW (34%)
BRAM Power 46.0 mW (32% of dynamic)
Logic Power 31.0 mW (22% of dynamic)
Signals Power 27.0 mW (19% of dynamic)
I/O Power 19.0 mW (14% of dynamic)
Clocks Power 19.0 mW (13% of dynamic)
Junction Temperature 26.1°C
Thermal Margin 58.9°C (11.7 W)

Hardware Testing & Modes

The system supports multiple operating modes controlled by slide switches:

SW0 SW1-SW6 SW7 Mode Description
0 0 0 CPU Welcome System info display
0 0 1 CPU BIST RV32I instruction regression test
1 1 0 IPU Sobel Real-time edge detection
1 2 0 IPU Grayscale Real-time grayscale conversion
1 3 0 IPU Threshold Real-time binary thresholding
1 4 0 IPU Convolution Real-time filter convolution
1 1 TinyML Finger-counting gesture recognition

BIST Mode — VGA Output

BIST Mode

BIST Mode

The hardware BIST mode runs 61 instructions covering all RV32I base instructions and IPU kernels, displaying pass/fail status directly on the VGA monitor:

CPU BIST MODE - RV32I BASELINE
ALL BASELINE CHECKS PASSED
TEST    EXP        GOT        RESULT
ADDI X1  0000000A  0000000A  PASS
ADDI X2  FFFFFFD   FFFFFFD   PASS
...
JAL  X29 000000EC  000000EC  PASS
JALR X30 00000060  00000060  PASS

Real-Time Image Processing Results

IPU Results

IPU Results

Mode Left Panel (Source) Right Panel (Processed)
(a) Sobel Edge Detection Color camera feed Edge-detected output
(b) Grayscale Conversion Color camera feed Grayscale output
(c) Image Thresholding Color camera feed Binary threshold output
(d) Convolution Filtering Color camera feed Filtered output (sharpen/blur)

TinyML Finger Counting Results

TinyML Results

TinyML Results

Detection Fingers Counted Accuracy
(a) 1 Finger Detected 1 Real-time
(b) 2 Fingers Detected 2 Real-time
(c) 3 Fingers Detected 3 Real-time
(d) 5 Fingers Detected 5 Real-time

TinyML Performance Metrics:

  • Overall Classification Accuracy: 80% (8/10 correct)
  • False Positive Rate: 10%
  • False Negative Rate: 10%
  • Classification Latency: 1 frame (real-time, no buffering)

🔬 ASIC Implementation

OpenROAD Flow

OpenROAD Flow

The ASIC implementation uses the fully open-source OpenROAD EDA flow with the SkyWater 130-nm CMOS PDK:

Phase 1: RTL Design & IP Integration
    ↓
Phase 2: Functional Verification (Simulation + FPGA)
    ↓
Phase 3: FPGA Prototyping
    ├── Design Entry & RTL Analysis
    ├── Synthesis & Optimization
    ├── Implementation: Place & Route
    └── Bitstream Gen & Programming
    ↓
FPGA Validated ✓ → Validated RTL & Constraints
    ↓
Phase 4: ASIC Implementation (OpenROAD)
    ├── Synthesis (Yosys)
    ├── Floorplan & PDN (Macro placement)
    ├── Placement (Global & Detail)
    ├── Physical Verification (DRC & LVS)
    ├── CTS & Routing (Clock tree synthesis)
    └── GDSII Export
    ↓
GDSII for Fabrication

Technology Selection: SkyWater 130nm

SkyWater PDK

Why SkyWater 130nm?

  • ✅ Fully open-source (Apache 2.0 license)
  • ✅ Mature, robust process with extensive documentation
  • ✅ 583 standard cells in SKY130_FD_SC_HD library
  • ✅ Compatible with OpenROAD automated flow
  • ✅ Active community + Open MPW shuttle programs
  • ✅ Educational accessibility over cutting-edge performance

Physical Design Results

Synthesis Results (Yosys)

Metric Value
Total Standard Cell Area 175,483 µm²
Equivalent NAND2 Gate Count ~60,932 gates
Total Wire Count 27,281
Sequential Cells (DFFs) 2,368 (8.9%)
Combinational Cells 24,323 (91.1%)

OpenROAD Flow

Floorplan & Die Statistics

Metric Value
Die Width × Height 760.0 × 760.0 µm
Total Die Area 0.5776 mm²
Core Width × Height 719.44 × 718.08 µm
Core Area 516,615 µm²
Core Utilization 33.97% (Target: 32%)
Aspect Ratio 1.00:1 (square)

OpenROAD Flow

Clock Tree Synthesis (TritonCTS)

Metric Value
Global Clock Skew -0.13 ns
Maximum Clock Latency 1.0482 ns
Minimum Clock Latency 1.0630 ns
Clock Buffers Inserted 464

OpenROAD Flow

Routing Results

Metric Value
Total Wirelength 1,312,842 µm (1.3128 m)
Routing Layers Used M1 - M5
Final DRC Violations 0 (CLEAN)

OpenROAD Flow

Static Timing Analysis (OpenSTA)

Metric Value
Worst Setup Slack +41.1253 ns
Worst Hold Slack +0.36 ns
Critical Path Delay 53.8747 ns
Maximum Operating Frequency 62.06 MHz
Setup Violations 0
Hold Violations 0

OpenROAD Flow

> ✅ Timing closure achieved with positive slack at 62.06 MHz!

Power Analysis

Metric Value
Leakage Power 0.092 µW
Internal Power 2,150.000 µW
Switching Power 1,610.000 µW
Total Power Consumption 3.760 mW

OpenROAD Flow

Power breakdown: The design achieves excellent power efficiency for edge-vision applications, with total consumption under 4.0 mW at 62.06 MHz — suitable for battery-powered and energy-harvesting deployments.

Physical Verification Signoff

Check Status Details
DRC ✅ CLEAN 0 violations
LVS ✅ EQUIVALENT Netlist matches layout
Antenna Check ✅ PASS 0 violations
Metal Density ✅ COMPLIANT All layers within SKY130 limits

OpenROAD Flow

GDSII Layout Views

Full-Chip GDSII
Full-chip GDSII layout with core area, IO ring, power network, and clock tree

Full-Chip GDSII
Full-chip GDSII layout with core area, without power, ground network, and clock tree

Transistor Zoom
Transistor-level zoom of standard cell implementation

I/O Right Side
Right-side I/O pad ring with signal, power, and ground pads


📊 Results & Performance

Functional Verification

Test Method Result
RV32I Instruction BIST Simulation + Hardware 100% PASS (61/61 instructions)
IPU Operation Tests Simulation + Hardware 100% PASS (7/7 operations)
Performance Counters Simulation 100% PASS (all counters match)
TinyML Finger Count Hardware 70% accuracy (real-time)

IPU Processing Performance

Operation Cycles Time (ms) Max FPS
Grayscale 65,538 0.655 1,525
Threshold 65,538 0.655 1,525
Sobel Edge 65,538 0.655 1,525
Gaussian Blur 81,922 0.819 1,220
Sharpen 81,922 0.819 1,220
Max Pool 81,922 0.819 1,220
Avg Pool 20,482 0.205 4,882

End-to-End System Performance

Metric Value
Camera Capture Rate 30 FPS (OV7670 native)
Processing Rate (Sobel) 60 FPS (frame-doubled)
Display Refresh Rate 60 Hz (VGA 640×480)
End-to-End Latency 16.7 ms
Frame Drop Rate 0.0%

FPGA vs ASIC Comparison

Metric FPGA (Artix-7) ASIC (SkyWater 130nm)
Frequency ~70 MHz (timing limited) 62.06 MHz (clean closure)
Power 216 mW 3.76 mW
Area 16,547 LUTs 0.5776 mm² die
Technology 28nm FPGA fabric 130nm CMOS
BRAM 30 tiles (60%) SRAM macros

Comparison with Related Work

Platform Tech. Pipeline Vision Accel. AI Accel. Open Vision I/O
EVPIX-RV32 (This work) 130nm 5-stage IPU (7 ops) TinyML Yes OV7670/VGA
PULPino 65nm 4-stage None None Yes None
Ibex 22nm 2-stage None None Yes None
Rocket Chip 45nm 5-stage RoCC only None Yes None
GAP8 55nm 1+8 cluster HWCE 8-core CNN Partial PulpCam
Eyeriss 65nm None CNN No None
TinyVers 22nm 2-stg+NPU None Reconf. NPU No None
Commercial Edge AI 40-90nm Cortex-M None NPU No None

I/O Right Side

I/O Right Side


📁 Project Structure

evpix_rv32/
├── README.md
├── Makefile
├── asic/
│   ├── flow/
│   │   ├── rtl_files.mk
│   │   ├── asap7/
│   │   │   ├── config.mk
│   │   │   └── constraint.sdc
│   │   └── sky130hd/
│   │       ├── config.mk
│   │       └── constraint.sdc
│   ├── rtl_src/
│   │   ├── asic/
│   │   │   ├── evpix_asic_core_top.sv
│   │   │   └── rv32i_core_asic_extmem.sv
│   │   └── common/
│   │       ├── adder.sv
│   │       ├── alu.sv
│   │       ├── alu_control.sv
│   │       ├── branch_unit.sv
│   │       ├── datapath.sv
│   │       ├── decode_stage.sv
│   │       ├── evpix_finger_model_pkg.sv
│   │       ├── evpix_ml_feature_extractor.sv
│   │       ├── evpix_tinyml_classifier.sv
│   │       ├── ex_mem_reg.sv
│   │       ├── execute_stage.sv
│   │       ├── fetch_stage.sv
│   │       ├── forwarding_unit.sv
│   │       ├── hazard_detection_unit.sv
│   │       ├── id_ex_reg.sv
│   │       ├── if_id_reg.sv
│   │       ├── imm_generator.sv
│   │       ├── instruction_memory_fpga.sv
│   │       ├── ipu_fpga.sv
│   │       ├── main_control.sv
│   │       ├── mem_wb_reg.sv
│   │       ├── program_counter.sv
│   │       ├── register_file.sv
│   │       └── writeback_stage.sv
│   ├── scripts/
│   │   ├── 00_RUN_ME_FIRST_SKY130_FULL.sh
│   │   ├── 00_linux_first_steps.sh
│   │   ├── 01_check_tools.sh
│   │   ├── 02_install_designs_into_orfs.sh
│   │   ├── 05_FIX_ORFS_READ_VERILOG_TCL.sh
│   │   ├── 10_run_sky130hd.sh
│   │   ├── 20_run_asap7.sh
│   │   ├── 30_collect_reports.sh
│   │   ├── 40_yosys_synth_only.sh
│   │   ├── 50_RUN_ASAP7_AFTER_SKY130.sh
│   │   ├── 70_MAKE_GDSII_COPY.sh
│   │   ├── 80_VIEW_LAYOUT.sh
│   │   ├── 81_EXPORT_LAYOUT_SCREENSHOT.sh
│   │   ├── 85_WRITE_GDS_FROM_FILLED_ODB.sh
│   │   ├── 98_KILL_STUCK_EVPIX_FLOW.sh
│   │   ├── 99_SHOW_LAST_ERROR.sh
│   │   └── yosys_synth_only.ys
│   └── signoff/
│       └── evpix_asic_sky130hd_GDSII.txt
├── docs/
│   └── documentation.pdf
├── fpga/
│   ├── bitstream/
│   │   └── evpix_rv32_top.bit
│   ├── constrains/
│   │   └── evpix_basys3.xdc
│   ├── rtl_src/
│   │   ├── adder.sv
│   │   ├── alu.sv
│   │   ├── alu_control.sv
│   │   ├── branch_unit.sv
│   │   ├── data_memory_fpga.sv
│   │   ├── datapath.sv
│   │   ├── decode_stage.sv
│   │   ├── evpix_finger_model_pkg.sv
│   │   ├── evpix_ml_feature_extractor.sv
│   │   ├── evpix_tinyml_classifier.sv
│   │   ├── evpix_top_ov7670_direct.sv
│   │   ├── evpix_vga_frame_display_db.sv
│   │   ├── ex_mem_reg.sv
│   │   ├── execute_stage.sv
│   │   ├── fetch_stage.sv
│   │   ├── forwarding_unit.sv
│   │   ├── hazard_detection_unit.sv
│   │   ├── id_ex_reg.sv
│   │   ├── if_id_reg.sv
│   │   ├── imm_generator.sv
│   │   ├── instruction_memory_fpga.sv
│   │   ├── ipu_fpga.sv
│   │   ├── main_control.sv
│   │   ├── mem_wb_reg.sv
│   │   ├── memory_stage.sv
│   │   ├── ov7670_capture_128_rgb565_to_rgb888.sv
│   │   ├── ov7670_sccb_init.sv
│   │   ├── program_counter.sv
│   │   ├── register_file.sv
│   │   ├── rv32i_core_fpga.sv
│   │   ├── vga_640x480.sv
│   │   └── writeback_stage.sv
│   └── testbench/
│       ├── memfile.hex
│       ├── memfile_ipu_system.hex
│       ├── memfile_pix.hex
│       ├── memfile_rv32i.hex
│       ├── tb_ipu_system.sv
│       ├── tb_rv32i_ipu_custom.sv
│       └── tb_rv32i_top.sv
├── simulation/
│   ├── rtl_src/
│   │   ├── adder.sv
│   │   ├── alu.sv
│   │   ├── alu_control.sv
│   │   ├── branch_unit.sv
│   │   ├── data_memory.sv
│   │   ├── datapath.sv
│   │   ├── decode_stage.sv
│   │   ├── evpix_top.sv
│   │   ├── ex_mem_reg.sv
│   │   ├── execute_stage.sv
│   │   ├── fetch_stage.sv
│   │   ├── forwarding_unit.sv
│   │   ├── hazard_detection_unit.sv
│   │   ├── id_ex_reg.sv
│   │   ├── if_id_reg.sv
│   │   ├── imm_generator.sv
│   │   ├── instruction_memory_fpga.sv
│   │   ├── ipu.sv
│   │   ├── main_control.sv
│   │   ├── mem_wb_reg.sv
│   │   ├── memory_stage.sv
│   │   ├── program_counter.sv
│   │   ├── register_file.sv
│   │   ├── rv32i_core.sv
│   │   └── writeback_stage.sv
│   └── testbench/
│       ├── memfile.hex
│       ├── memfile_ipu_system.hex
│       ├── memfile_pix.hex
│       ├── memfile_rv32i.hex
│       ├── tb_ipu_system.sv
│       ├── tb_rv32i_ipu_custom.sv
│       └── tb_rv32i_top.sv
└── images/
    └── (90+ images — diagrams, ASIC layouts, FPGA results, etc.)


🚀 Getting Started

This guide walks you through setting up your Linux environment and running the three main design flows:

  • Simulation (Vivado xsim)
  • FPGA (Basys-3 + OV7670)
  • ASIC (OpenROAD Flow Scripts)

Follow the steps in order.


1. Simulation (Vivado xsim)

The simulation flow uses Vivado's built-in simulator (xsim) and sources RTL from the simulation/ directory.

1.1 Prerequisites

Tool Purpose Install Command
make Build automation sudo apt install -y make
build-essential Compilers & development tools sudo apt install -y build-essential
Xilinx Vivado Simulation, Synthesis & FPGA See §1.2

1.2 Installing Xilinx Vivado (WebPACK — Free)

Vivado is required for both simulation and FPGA flows.

The free WebPACK edition supports Artix-7 devices.

Step A — Download the Installer

  1. Create a free AMD/Xilinx account at https://www.xilinx.com
  2. Download the AMD Unified Installer for FPGAs & Adaptive SoCs
    • Linux Self-Extracting Web Installer (~290 MB)
    • Available from the Vivado download page.

Step B — Install Dependencies

sudo apt update

sudo apt install -y \
    libtinfo-dev \
    libncurses-dev \
    libglib2.0-dev \
    libgtk2.0-dev \
    zlib1g \
    python3-dev \
    python3-pip \
    default-jre \
    default-jdk \
    libswt-gtk-4-jni \
    locales \
    tar \
    gzip \
    gcc \
    g++ \
    make \
    build-essential

Step C — Run the Installer

# Navigate to your download directory

chmod +x FPGAs_AdaptiveSoCs_Unified_2024.1_0522_2023_Lin64.bin

sh FPGAs_AdaptiveSoCs_Unified_2024.1_0522_2023_Lin64.bin

During installation:

  • Select Vivado ML Standard Edition
  • Select Vivado
  • (Optional) Install Vitis
  • Select Artix-7 devices only
  • Choose an installation directory

Recommended:

/tools/Xilinx/

or

/opt/Xilinx/

Step D — Add Vivado to PATH

Add the following line to your ~/.bashrc.

Adjust the version/path if necessary.

source /tools/Xilinx/Vivado/2024.1/settings64.sh

Reload your shell.

source ~/.bashrc

Step E — Verify Installation

vivado -version

xvlog -version

xelab -version

xsim -version

Troubleshooting

If you encounter

libtinfo.so.5: cannot open shared object file

run

sudo apt install libtinfo-dev

sudo ln -s \
/lib/x86_64-linux-gnu/libtinfo.so.6 \
/lib/x86_64-linux-gnu/libtinfo.so.5

1.3 Clone & Setup the Project

git clone https://github.com/aukhalid/evpix_rv32.git

cd evpix_rv32

# Create the build directory tree
make setup

Run make setup only once after cloning.


1.4 Run Simulations

RV32I Core Regression

make sim_core

IPU Functional Testbench

make sim_ipu

Custom ISA + IPU Testbench

make sim_custom

Run Everything

make sim_all

Simulation Outputs

Logs

build/sim/logs/

Waveforms (when WAVES=1)

build/sim/xsim.dir/

Memory HEX files are automatically copied from

simulation/testbench/

Open the Waveform Viewer

make sim_core WAVES=1

This launches the Vivado xsim GUI.


2. FPGA Prototyping (Basys-3 + OV7670)

The FPGA flow uses Vivado in batch mode and sources RTL from the fpga/ directory.


2.1 Additional Prerequisites

Hardware Details
Digilent Basys-3 Xilinx Artix-7 FPGA (xc7a35tcpg236-1)
OV7670 Camera With breakout board, connected via PMOD
VGA Monitor Standard VGA
USB-A → Micro-B Cable Programming & power

Note: The free Vivado WebPACK license fully supports the Artix-7 on the Basys-3.


2.2 FPGA Build Flow

Run the complete flow:

make fpga_all

Or execute each step individually.

Synthesis

make fpga_synth_only

Produces

post_synth.dcp

Place & Route

make fpga_impl_only

Produces

post_route.dcp

Bitstream Generation

make fpga_bit_only

FPGA Outputs

Checkpoints

build/fpga/synth/

build/fpga/impl/

Bitstream

build/fpga/bitstream/evpix_rv32_top.bit

Reports

build/fpga/reports/

Includes:

  • Timing
  • Utilization
  • Power

2.3 Program the Board

make fpga_program

This flashes the bitstream over JTAG.


Physical Connections

  • Connect the Basys-3 using USB
  • Connect the OV7670 camera to the PMOD header
  • Connect a VGA monitor
  • Power on the board
  • Use onboard switches to select operating mode

Constraint file:

fpga/constrains/evpix_basys3.xdc

Overridable Variables

make fpga_all \
FPGA_TOP=your_custom_top \
FPGA_PART=xc7a35tcpg236-1

3. ASIC Synthesis (OpenROAD Flow Scripts)

The ASIC flow uses OpenROAD Flow Scripts (ORFS) with either

  • SKY130HD
  • ASAP7

to perform complete RTL-to-GDSII physical implementation.


3.1 Prerequisites

Tool Purpose Install
git Version Control sudo apt install -y git
python3-venv python3-pip python3-yaml Python tooling sudo apt install -y python3-venv python3-pip python3-yaml
OpenROAD Flow Scripts Complete RTL-to-GDSII Flow See §3.2
KLayout GDS Viewer Installed by ORFS

Recommended Hardware

Minimum

  • 1 CPU core
  • 8 GB RAM

Recommended

  • 4+ CPU cores
  • 16+ GB RAM
  • 100+ GB free disk space

3.2 Installing OpenROAD Flow Scripts

Step A — Clone

mkdir -p ~/Work/vlsi/tools

cd ~/Work/vlsi/tools

git clone --recursive \
https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts

cd OpenROAD-flow-scripts

Step B — Install Dependencies

sudo ./setup.sh

This installs

  • CMake
  • Boost
  • Lemon
  • SWIG
  • Eigen
  • Yosys dependencies
  • KLayout

Step C — Build

./build_openroad.sh --local

Compilation typically takes

20–60 minutes

depending on hardware.


Step D — Configure Environment

Add to your ~/.bashrc

source ~/Work/vlsi/tools/OpenROAD-flow-scripts/env.sh

Reload

source ~/.bashrc

Step E — Verify Installation

yosys -help

openroad -help

Step F — Test with Sample Design

cd ~/Work/vlsi/tools/OpenROAD-flow-scripts/flow

make

make gui_final

This launches the OpenROAD GUI.


3.3 ASIC Build Flow for EVPIX-RV32

The Makefile assumes

~/OpenROAD-flow-scripts

Override with

ORFS_DIR=

if necessary.


Verify ORFS

make asic_setup_check

Install Design

make asic_install

Synthesis Only

make asic_synth_only

Full RTL-to-GDSII Flow

make asic_all

Typically takes

20–60 minutes


ASAP7 Flow

make asic_asap7

ASIC Outputs

Logs

build/asic/logs/

Reports

build/asic/reports/qor_summary.txt

GDSII

build/asic/gds/

Automatically copied using

asic_gds_copy

View Layout

make asic_view

Opens the GDSII in KLayout.


View QoR

make asic_report

Prints

  • Area
  • Timing
  • Power

Kill a Stuck Flow

make asic_kill

Overridable Variables

make asic_all \
ORFS_DIR=/custom/path \
PLATFORM=sky130hd

4. Cleanup Commands

Remove simulation artifacts

make clean

Remove simulation, FPGA and ASIC artifacts

make clean_all

Remove the entire build directory

make distclean

Quick Reference

Target Description
make help Show all available targets
make setup Create build directory tree
make sim_core Run RV32I simulation
make sim_ipu Run IPU simulation
make sim_custom Run Custom ISA simulation
make sim_all Run all simulations
make fpga_synth_only FPGA synthesis
make fpga_impl_only FPGA implementation
make fpga_bit_only FPGA bitstream
make fpga_all Complete FPGA flow
make fpga_program Program Basys-3
make asic_setup_check Verify ORFS installation
make asic_install Install project into ORFS
make asic_synth_only ASIC synthesis
make asic_all Complete ASIC flow
make asic_asap7 ASAP7 flow
make asic_report QoR summary
make asic_view View layout
make asic_kill Stop running flow

Need Help?

Documentation

make help

Simulation Logs

build/sim/logs/

FPGA Reports

build/fpga/reports/

ASIC Logs

build/asic/logs/

For the latest ASIC error

make asic_last_error

🔮 Future Work

  1. Compiler Support — GCC/LLVM backend with custom instruction intrinsics
  2. DMA Engine — Lightweight DMA for autonomous data movement
  3. Higher Resolution — QVGA (320×240) and VGA (640×480) support with external memory
  4. Advanced IPU Kernels — Morphological operations, histogram equalization, motion estimation
  5. Native TinyML — Port finger-counting model to run entirely on EVPIX-RV32 CPU/IPU
  6. Formal Verification — SVA assertions and model checking for pipeline correctness
  7. Advanced Node ASIC — Migration to 65nm or 28nm for area/power reduction
  8. Multi-Core Extension — Dual-core heterogeneous configuration

📖 Citation

If you use EVPIX-RV32 in your research, please cite:

@thesis{khalid2026evpix,
  author    = {Khalid, Ahasan Ullah},
  title     = {{EVPIX-RV32: 5-Stage Custom RISC-V SoC with Integrated IPU and TinyML Support for Real-Time Edge-Vision AI Acceleration}},
  school    = {Chittagong University of Engineering and Technology},
  year      = {2026},
  type      = {Bachelor's Thesis},
  department = {Electronics and Telecommunication Engineering},
  address   = {Chattogram-4349, Bangladesh}
}

🙏 Acknowledgments

  • Department: Department of Electronics & Telecommunication Engineering
  • Supervisor: Md. Farhad Hossain, Assistant Professor, ETE Department, CUET
  • Specal Thanks: Arif Istiaque, Assistant Professor, ETE Department, CUET
  • Institution: Chittagong University of Engineering & Technology (CUET)
  • Open-Source Community: RISC-V International, OpenROAD Project, SkyWater PDK, YosysHQ
  • Tools: Xilinx Vivado, Digilent Basys-3 FPGA, OpenROAD-Flow-Script, Magic, KLayout

📄 License

This project is licensed under the Apache 2.0 License — see the LICENSE file for details.

The RISC-V ISA is an open standard maintained by RISC-V International. The SkyWater 130nm PDK is provided under the Apache 2.0 license by Google and SkyWater Technology.


Built with ❤️ at CUET | Open Source | Open Silicon | Open Education

⭐ Star this repo🐛 Report Issues🔀 Contribute

About

EVPIX-RV32: 5-Stage Custom RISC-V SoC with Integrated IPU and TinyML Support for Real-Time Edge-Vision AI Acceleration: RTL-to-GDSII Design, Verification, Basys-3 Artix-7 FPGA Prototyping and SkyWater 130-nm CMOS ASIC was implementation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors