EVPIX-RV32: 5-Stage Custom RISC-V SoC with Integrated IPU and TinyML Support for Real-Time Edge-Vision AI Acceleration
A custom 5-Stage edge-vision processor combining RISC-V programmability with hardware-accelerated Image Processing Unit and TinyML Inference. Verified it with BIST Testbenches, prototyped in real hardware, on Digilent Basys-3 AMD Artix-7 FPGA and finally full RTL-to-GDSII SkyWater 130-nm CMOS ASIC was implemented.
- Overview
- System Architecture
- Key Features
- Hardware Architecture
- Custom ISA Extensions
- FPGA Implementation
- ASIC Implementation
- Results & Performance
- Project Structure
- Getting Started
- Future Work
- Citation
- Acknowledgments
EVPIX-RV32 is a custom 5-stage pipelined RISC-V RV32I processor with hardware image-processing extensions, a streaming Image Processing Unit (IPU), and TinyML support built for real-time edge-vision AI acceleration. The architecture extends the RISC-V ISA with custom instructions for grayscale conversion, thresholding, Sobel edge detection, and 2D convolution in the custom-0 opcode space, connecting to an autonomous IPU that processes 128×128 frames at pixel-level parallelism.
The system was prototyped on the Digilent Basys-3 AMD Artix-7 FPGA with an OV7670 camera and dual-region VGA display, sustaining 60 FPS with zero frame drops. A hardware Built-In Self-Test (BIST) mode runs 61 instructions and shows pass/fail results on the VGA monitor, while a TinyML finger-counting demo validates lightweight neural inference on the platform.
The design was synthesized through the open-source OpenROAD flow targeting SkyWater 130-nm CMOS, producing a DRC-clean, LVS-equivalent GDSII layout across 0.57 mm², operating at 62 MHz with 3.76 mW total power.
Thesis: Bachelor of Science (B.Sc.)
Department: Electronics and Telecommunication Engineering
Institution: Chittagong University of Engineering & Technology (CUET)
Author: Ahasan Ullah Khalid (2008051)
Supervisor: Md. Farhad Hossain, Assistant Professor, Department of ETE
The EVPIX-RV32 system is a heterogeneous vision SoC that combines a general-purpose 32-bit RISC-V processor with dedicated image-processing hardware. The architecture uses a Harvard-style memory organization with separate instruction and data memories, plus a multi-port data memory that supports concurrent CPU and IPU access.
| Component | Description |
|---|---|
| RV32I Core | 5-stage pipelined processor with full hazard detection and forwarding |
| Image Processing Unit (IPU) | Dedicated hardware accelerator for 7 image-processing operations |
| Memory Subsystem | 64KB unified data memory (dual-port BRAM) + Instruction ROM |
| Camera Interface | OV7670 parallel DVP with SCCB configuration (128×128 @ 30 FPS) |
| Display Interface | VGA controller (640×480 @ 60Hz) with dual-buffered frame display |
| TinyML Accelerator | Hardware feature extractor + classifier for finger-counting |
| Control/Status I/O | 16 slide switches, 16 LEDs for mode selection and status |
The system interconnect (AXI Bus) enables communication between the RV32I core, IPU, TinyML accelerator, unified data memory, and peripheral interfaces. The camera module feeds raw pixel data through the OV7670 DVP interface, while the VGA display controller outputs processed frames with real-time performance overlays.
- ✅ Complete RV32I Base ISA — All 40+ integer instructions with full forwarding and hazard detection
- ✅ 8 Custom Image-Processing Instructions — GRAYSCALE, THRESH, SOBEL, CONV, VDOT, RELU, HACC, OTSU
- ✅ 7 Hardware-Accelerated IPU Operations — Grayscale, Threshold, Sobel Edge, 2D Convolution, Max Pool, Avg Pool, Max Pixel
- ✅ Real-Time 60 FPS Processing — Zero frame drops at 128×128 resolution
- ✅ Hardware BIST Mode — 61-instruction regression with VGA pass/fail display
- ✅ TinyML Finger Counting — Real-time gesture recognition (0-5 fingers)
- ✅ Open-Source ASIC Flow — Complete RTL-to-GDSII using OpenROAD + SkyWater 130nm
- ✅ Low Power — 3.24 mW total power at 100 MHz in 130nm CMOS
The processor implements a classic five-stage RISC pipeline with full hazard detection and data forwarding:
| Stage | Function | Key Components |
|---|---|---|
| IF — Instruction Fetch | Reads next instruction from memory | Program Counter, Instruction Memory, PC Incrementer |
| ID — Instruction Decode | Decodes instruction, reads registers | Register File (32×32-bit), Immediate Generator, Control Unit |
| EX — Execute | Performs ALU operations, branch decisions | ALU, Branch Comparator, Forwarding MUXes, IPU Interface |
| MEM — Memory Access | Reads/writes data memory | Data Memory, Load/Store Logic, Byte/Halfword/Word access |
| WB — Write Back | Writes results to register file | Result MUX (ALU/Mem/PC+4), Register File Write Port |
The pipeline achieves near-ideal IPC for sequential code with single-cycle branch resolution and full forwarding paths.
The IPU is a dedicated hardware accelerator controlled through custom R-type instructions. It features:
- FSM Controller — 16 states from idle to finish
- 3×3 Window Registers — win0-win8 pixel buffer for convolution operations
- Kernel Coefficient ROM — 6 built-in kernels (Identity, Sobel X, Sobel Y, Gaussian Blur, Sharpen, Edge Detect)
- Sobel Gradient Unit — Computes Gx, Gy, and gradient magnitude
- Pooling Unit — 2×2 max/average pooling
- Pixel Write/ALU Logic — Gray, thresh, conv, sobel, pool selection
- Memory Interface — Direct dual-port BRAM access
| Operation | Description | Performance (128×128 @ 100MHz) |
|---|---|---|
| Grayscale | RGB to luminance: Y = (77R + 150G + 29B) >> 8 | 1,525 FPS |
| Threshold | Binary threshold at configurable level | 1,525 FPS |
| Max Pixel | Finds maximum pixel value in image | 1,525 FPS |
| Sobel Edge | 3×3 gradient magnitude computation | 1,525 FPS |
| 2D Convolution | Programmable kernel convolution | 1,220 FPS |
| Max Pool | Non-overlapping 2×2 max pooling | 1,220 FPS |
| Avg Pool | Non-overlapping 2×2 average pooling | 4,882 FPS |
The memory map is organized as follows:
| Address Range | Size | Content |
|---|---|---|
0x0000_0000 – 0x0000_BFFF |
48 KB | 128×128 RGB888 Source Image Buffer |
0x0000_C000 – 0x0000_FFFF |
16 KB | 128×128 8-bit Processed Output Buffer |
The 64KB unified data memory uses dual-port BRAM supporting concurrent CPU load/store and IPU direct memory access.
The OV7670 CMOS camera module connects via parallel DVP interface with:
- 8-bit pixel data bus
- PCLK (pixel clock), HREF (horizontal sync), VSYNC (vertical sync)
- SCCB (I2C-compatible) configuration bus
- 128×128 resolution at 30 FPS native capture rate
The Basys-3 VGA interface provides:
- 12-bit RGB (4-bit per channel) via resistor-DAC network
- 640×480 @ 60Hz standard timing
- Dual-region display: original frame (left) + processed frame (right)
- On-screen performance overlays (FPS, cycle counts, mode status)
The TinyML subsystem includes:
- Hardware Feature Extractor — Skin-color detection and finger-region segmentation
- Classifier — Quantized neural network for finger-counting (0-5 classes)
- Temporal Stability Filter — Reduces jitter between frames
- Integration — Results overlaid on VGA display in real-time
The processor extends RV32I with 8 custom instructions encoded in the custom-0 opcode space (0001011):
| Instruction | Opcode | funct3 | funct7 | Description |
|---|---|---|---|---|
GRAYSCALE |
0001011 |
000 |
kernel[3:0], op=0 |
Convert RGB to grayscale |
THRESH |
0001011 |
000 |
kernel[3:0], op=1 |
Apply binary threshold |
SOBEL |
0001011 |
000 |
kernel[3:0], op=2 |
Sobel edge detection |
CONV |
0001011 |
000 |
kernel[3:0], op=3 |
2D convolution with kernel |
VDOT |
0001011 |
001 |
— | Vector dot product for ML |
RELU |
0001011 |
010 |
— | Rectified linear activation |
HACC |
0001011 |
011 |
— | Histogram accumulation |
OTSU |
0001011 |
100 |
— | Otsu threshold calculation |
The funct3 field selects the IPU operation type (START, STATUS, RESULT, PERF), while funct7 selects the algorithm and kernel. This encoding maintains full compatibility with standard RV32I tools and compilers.
The Digilent Basys-3 development board features:
- FPGA: Xilinx Artix-7 XC7A35T-1CPG236C
- 33,280 LUTs | 66,400 FFs | 90 BRAMs (1,800 Kb) | 90 DSP48E1 slices
- Clock: 100 MHz onboard oscillator
- I/O: 16 slide switches, 16 LEDs, 5 pushbuttons, 4-digit 7-segment display
- Display: VGA port (12-bit RGB)
- Expansion: 4 Pmod connectors
- Programming: USB-JTAG via shared UART/JTAG port
| Callout | Component | Use in EVPIX-RV32 |
|---|---|---|
| 1 | Power Good LED | Power status indicator |
| 2 | Pmod Ports | OV7670 camera connection |
| 3 | Analog Pmod (XADC) | — |
| 4 | 7-Segment Display | Performance counters |
| 5 | Slide Switches (16) | Mode selection (CPU/IPU/BIST/TinyML) |
| 6 | LEDs (16) | Status indicators |
| 7 | Pushbuttons (5) | Reset, user input |
| 8 | FPGA Programming Done LED | Configuration status |
| 9 | FPGA Configuration Reset | Hardware reset |
| 10 | Programming Mode Jumper | JTAG/SPI selection |
| 11 | Shared UART/JTAG USB | Programming and debug |
| 12 | VGA Connector | Monitor output |
| 13 | Shared UART/JTAG USB | Alternative programming |
| 14 | External Power Connector | — |
| 15 | Power Switch | Board power |
| 16 | Power Select Jumper | USB/External power |
The physical prototype connects:
- OV7670 camera → Pmod-compatible breakout → Basys-3 Pmod port
- VGA monitor → Basys-3 VGA port via DB15 cable
- USB power → Basys-3 micro-USB for power and programming
The FPGA implementation follows the standard Xilinx Vivado flow:
1. New RTL Project → Target: xc7a35tcpg236-1
2. Design Entry → Add SystemVerilog (*.sv) sources + XDC constraints
3. RTL Analysis → Elaborate design, check for issues
4. Synthesis → Area optimization, default strategy
5. Implementation → Placement & Routing, default settings
6. Bitstream Generation → Compression enabled
7. Hardware Programming → FPGA via JTAG (Hardware Manager)
| Resource | Used | Available | Utilization |
|---|---|---|---|
| Slice LUTs | 16,547 | 20,800 | 79.55% |
| Slice Registers | 5,534 | 41,600 | 13.30% |
| Block RAM Tile | 30 | 50 | 60.00% |
| DSP Slices | 0 | 90 | 0.00% |
| Clock Buffers (BUFG) | 2 | 32 | 6.25% |
| Bonded IOB | 62 | 106 | 58.49% |
Note: All arithmetic is LUT-based (no DSP slices) for maximum portability across FPGA families and clean ASIC synthesis.
| Metric | Value |
|---|---|
| Total On-Chip Power | 216.0 mW |
| Dynamic Power | 142.0 mW (66%) |
| Device Static Power | 73.0 mW (34%) |
| BRAM Power | 46.0 mW (32% of dynamic) |
| Logic Power | 31.0 mW (22% of dynamic) |
| Signals Power | 27.0 mW (19% of dynamic) |
| I/O Power | 19.0 mW (14% of dynamic) |
| Clocks Power | 19.0 mW (13% of dynamic) |
| Junction Temperature | 26.1°C |
| Thermal Margin | 58.9°C (11.7 W) |
The system supports multiple operating modes controlled by slide switches:
| SW0 | SW1-SW6 | SW7 | Mode | Description |
|---|---|---|---|---|
| 0 | 0 | 0 | CPU Welcome | System info display |
| 0 | 0 | 1 | CPU BIST | RV32I instruction regression test |
| 1 | 1 | 0 | IPU Sobel | Real-time edge detection |
| 1 | 2 | 0 | IPU Grayscale | Real-time grayscale conversion |
| 1 | 3 | 0 | IPU Threshold | Real-time binary thresholding |
| 1 | 4 | 0 | IPU Convolution | Real-time filter convolution |
| 1 | — | 1 | TinyML | Finger-counting gesture recognition |
The hardware BIST mode runs 61 instructions covering all RV32I base instructions and IPU kernels, displaying pass/fail status directly on the VGA monitor:
CPU BIST MODE - RV32I BASELINE
ALL BASELINE CHECKS PASSED
TEST EXP GOT RESULT
ADDI X1 0000000A 0000000A PASS
ADDI X2 FFFFFFD FFFFFFD PASS
...
JAL X29 000000EC 000000EC PASS
JALR X30 00000060 00000060 PASS
| Mode | Left Panel (Source) | Right Panel (Processed) |
|---|---|---|
| (a) Sobel Edge Detection | Color camera feed | Edge-detected output |
| (b) Grayscale Conversion | Color camera feed | Grayscale output |
| (c) Image Thresholding | Color camera feed | Binary threshold output |
| (d) Convolution Filtering | Color camera feed | Filtered output (sharpen/blur) |
| Detection | Fingers Counted | Accuracy |
|---|---|---|
| (a) 1 Finger Detected | 1 | Real-time |
| (b) 2 Fingers Detected | 2 | Real-time |
| (c) 3 Fingers Detected | 3 | Real-time |
| (d) 5 Fingers Detected | 5 | Real-time |
TinyML Performance Metrics:
- Overall Classification Accuracy: 80% (8/10 correct)
- False Positive Rate: 10%
- False Negative Rate: 10%
- Classification Latency: 1 frame (real-time, no buffering)
The ASIC implementation uses the fully open-source OpenROAD EDA flow with the SkyWater 130-nm CMOS PDK:
Phase 1: RTL Design & IP Integration
↓
Phase 2: Functional Verification (Simulation + FPGA)
↓
Phase 3: FPGA Prototyping
├── Design Entry & RTL Analysis
├── Synthesis & Optimization
├── Implementation: Place & Route
└── Bitstream Gen & Programming
↓
FPGA Validated ✓ → Validated RTL & Constraints
↓
Phase 4: ASIC Implementation (OpenROAD)
├── Synthesis (Yosys)
├── Floorplan & PDN (Macro placement)
├── Placement (Global & Detail)
├── Physical Verification (DRC & LVS)
├── CTS & Routing (Clock tree synthesis)
└── GDSII Export
↓
GDSII for Fabrication
Why SkyWater 130nm?
- ✅ Fully open-source (Apache 2.0 license)
- ✅ Mature, robust process with extensive documentation
- ✅ 583 standard cells in SKY130_FD_SC_HD library
- ✅ Compatible with OpenROAD automated flow
- ✅ Active community + Open MPW shuttle programs
- ✅ Educational accessibility over cutting-edge performance
| Metric | Value |
|---|---|
| Total Standard Cell Area | 175,483 µm² |
| Equivalent NAND2 Gate Count | ~60,932 gates |
| Total Wire Count | 27,281 |
| Sequential Cells (DFFs) | 2,368 (8.9%) |
| Combinational Cells | 24,323 (91.1%) |
| Metric | Value |
|---|---|
| Die Width × Height | 760.0 × 760.0 µm |
| Total Die Area | 0.5776 mm² |
| Core Width × Height | 719.44 × 718.08 µm |
| Core Area | 516,615 µm² |
| Core Utilization | 33.97% (Target: 32%) |
| Aspect Ratio | 1.00:1 (square) |
| Metric | Value |
|---|---|
| Global Clock Skew | -0.13 ns |
| Maximum Clock Latency | 1.0482 ns |
| Minimum Clock Latency | 1.0630 ns |
| Clock Buffers Inserted | 464 |
| Metric | Value |
|---|---|
| Total Wirelength | 1,312,842 µm (1.3128 m) |
| Routing Layers Used | M1 - M5 |
| Final DRC Violations | 0 (CLEAN) |
| Metric | Value |
|---|---|
| Worst Setup Slack | +41.1253 ns |
| Worst Hold Slack | +0.36 ns |
| Critical Path Delay | 53.8747 ns |
| Maximum Operating Frequency | 62.06 MHz |
| Setup Violations | 0 |
| Hold Violations | 0 |
> ✅ Timing closure achieved with positive slack at 62.06 MHz!
| Metric | Value |
|---|---|
| Leakage Power | 0.092 µW |
| Internal Power | 2,150.000 µW |
| Switching Power | 1,610.000 µW |
| Total Power Consumption | 3.760 mW |
Power breakdown: The design achieves excellent power efficiency for edge-vision applications, with total consumption under 4.0 mW at 62.06 MHz — suitable for battery-powered and energy-harvesting deployments.
| Check | Status | Details |
|---|---|---|
| DRC | ✅ CLEAN | 0 violations |
| LVS | ✅ EQUIVALENT | Netlist matches layout |
| Antenna Check | ✅ PASS | 0 violations |
| Metal Density | ✅ COMPLIANT | All layers within SKY130 limits |
Full-chip GDSII layout with core area, IO ring, power network, and clock tree
Full-chip GDSII layout with core area, without power, ground network, and clock tree
Transistor-level zoom of standard cell implementation
Right-side I/O pad ring with signal, power, and ground pads
| Test | Method | Result |
|---|---|---|
| RV32I Instruction BIST | Simulation + Hardware | 100% PASS (61/61 instructions) |
| IPU Operation Tests | Simulation + Hardware | 100% PASS (7/7 operations) |
| Performance Counters | Simulation | 100% PASS (all counters match) |
| TinyML Finger Count | Hardware | 70% accuracy (real-time) |
| Operation | Cycles | Time (ms) | Max FPS |
|---|---|---|---|
| Grayscale | 65,538 | 0.655 | 1,525 |
| Threshold | 65,538 | 0.655 | 1,525 |
| Sobel Edge | 65,538 | 0.655 | 1,525 |
| Gaussian Blur | 81,922 | 0.819 | 1,220 |
| Sharpen | 81,922 | 0.819 | 1,220 |
| Max Pool | 81,922 | 0.819 | 1,220 |
| Avg Pool | 20,482 | 0.205 | 4,882 |
| Metric | Value |
|---|---|
| Camera Capture Rate | 30 FPS (OV7670 native) |
| Processing Rate (Sobel) | 60 FPS (frame-doubled) |
| Display Refresh Rate | 60 Hz (VGA 640×480) |
| End-to-End Latency | 16.7 ms |
| Frame Drop Rate | 0.0% |
| Metric | FPGA (Artix-7) | ASIC (SkyWater 130nm) |
|---|---|---|
| Frequency | ~70 MHz (timing limited) | 62.06 MHz (clean closure) |
| Power | 216 mW | 3.76 mW |
| Area | 16,547 LUTs | 0.5776 mm² die |
| Technology | 28nm FPGA fabric | 130nm CMOS |
| BRAM | 30 tiles (60%) | SRAM macros |
| Platform | Tech. | Pipeline | Vision Accel. | AI Accel. | Open | Vision I/O |
|---|---|---|---|---|---|---|
| EVPIX-RV32 (This work) | 130nm | 5-stage | IPU (7 ops) | TinyML | Yes | OV7670/VGA |
| PULPino | 65nm | 4-stage | None | None | Yes | None |
| Ibex | 22nm | 2-stage | None | None | Yes | None |
| Rocket Chip | 45nm | 5-stage | RoCC only | None | Yes | None |
| GAP8 | 55nm | 1+8 cluster | HWCE | 8-core CNN | Partial | PulpCam |
| Eyeriss | 65nm | — | None | CNN | No | None |
| TinyVers | 22nm | 2-stg+NPU | None | Reconf. NPU | No | None |
| Commercial Edge AI | 40-90nm | Cortex-M | None | NPU | No | None |
evpix_rv32/
├── README.md
├── Makefile
├── asic/
│ ├── flow/
│ │ ├── rtl_files.mk
│ │ ├── asap7/
│ │ │ ├── config.mk
│ │ │ └── constraint.sdc
│ │ └── sky130hd/
│ │ ├── config.mk
│ │ └── constraint.sdc
│ ├── rtl_src/
│ │ ├── asic/
│ │ │ ├── evpix_asic_core_top.sv
│ │ │ └── rv32i_core_asic_extmem.sv
│ │ └── common/
│ │ ├── adder.sv
│ │ ├── alu.sv
│ │ ├── alu_control.sv
│ │ ├── branch_unit.sv
│ │ ├── datapath.sv
│ │ ├── decode_stage.sv
│ │ ├── evpix_finger_model_pkg.sv
│ │ ├── evpix_ml_feature_extractor.sv
│ │ ├── evpix_tinyml_classifier.sv
│ │ ├── ex_mem_reg.sv
│ │ ├── execute_stage.sv
│ │ ├── fetch_stage.sv
│ │ ├── forwarding_unit.sv
│ │ ├── hazard_detection_unit.sv
│ │ ├── id_ex_reg.sv
│ │ ├── if_id_reg.sv
│ │ ├── imm_generator.sv
│ │ ├── instruction_memory_fpga.sv
│ │ ├── ipu_fpga.sv
│ │ ├── main_control.sv
│ │ ├── mem_wb_reg.sv
│ │ ├── program_counter.sv
│ │ ├── register_file.sv
│ │ └── writeback_stage.sv
│ ├── scripts/
│ │ ├── 00_RUN_ME_FIRST_SKY130_FULL.sh
│ │ ├── 00_linux_first_steps.sh
│ │ ├── 01_check_tools.sh
│ │ ├── 02_install_designs_into_orfs.sh
│ │ ├── 05_FIX_ORFS_READ_VERILOG_TCL.sh
│ │ ├── 10_run_sky130hd.sh
│ │ ├── 20_run_asap7.sh
│ │ ├── 30_collect_reports.sh
│ │ ├── 40_yosys_synth_only.sh
│ │ ├── 50_RUN_ASAP7_AFTER_SKY130.sh
│ │ ├── 70_MAKE_GDSII_COPY.sh
│ │ ├── 80_VIEW_LAYOUT.sh
│ │ ├── 81_EXPORT_LAYOUT_SCREENSHOT.sh
│ │ ├── 85_WRITE_GDS_FROM_FILLED_ODB.sh
│ │ ├── 98_KILL_STUCK_EVPIX_FLOW.sh
│ │ ├── 99_SHOW_LAST_ERROR.sh
│ │ └── yosys_synth_only.ys
│ └── signoff/
│ └── evpix_asic_sky130hd_GDSII.txt
├── docs/
│ └── documentation.pdf
├── fpga/
│ ├── bitstream/
│ │ └── evpix_rv32_top.bit
│ ├── constrains/
│ │ └── evpix_basys3.xdc
│ ├── rtl_src/
│ │ ├── adder.sv
│ │ ├── alu.sv
│ │ ├── alu_control.sv
│ │ ├── branch_unit.sv
│ │ ├── data_memory_fpga.sv
│ │ ├── datapath.sv
│ │ ├── decode_stage.sv
│ │ ├── evpix_finger_model_pkg.sv
│ │ ├── evpix_ml_feature_extractor.sv
│ │ ├── evpix_tinyml_classifier.sv
│ │ ├── evpix_top_ov7670_direct.sv
│ │ ├── evpix_vga_frame_display_db.sv
│ │ ├── ex_mem_reg.sv
│ │ ├── execute_stage.sv
│ │ ├── fetch_stage.sv
│ │ ├── forwarding_unit.sv
│ │ ├── hazard_detection_unit.sv
│ │ ├── id_ex_reg.sv
│ │ ├── if_id_reg.sv
│ │ ├── imm_generator.sv
│ │ ├── instruction_memory_fpga.sv
│ │ ├── ipu_fpga.sv
│ │ ├── main_control.sv
│ │ ├── mem_wb_reg.sv
│ │ ├── memory_stage.sv
│ │ ├── ov7670_capture_128_rgb565_to_rgb888.sv
│ │ ├── ov7670_sccb_init.sv
│ │ ├── program_counter.sv
│ │ ├── register_file.sv
│ │ ├── rv32i_core_fpga.sv
│ │ ├── vga_640x480.sv
│ │ └── writeback_stage.sv
│ └── testbench/
│ ├── memfile.hex
│ ├── memfile_ipu_system.hex
│ ├── memfile_pix.hex
│ ├── memfile_rv32i.hex
│ ├── tb_ipu_system.sv
│ ├── tb_rv32i_ipu_custom.sv
│ └── tb_rv32i_top.sv
├── simulation/
│ ├── rtl_src/
│ │ ├── adder.sv
│ │ ├── alu.sv
│ │ ├── alu_control.sv
│ │ ├── branch_unit.sv
│ │ ├── data_memory.sv
│ │ ├── datapath.sv
│ │ ├── decode_stage.sv
│ │ ├── evpix_top.sv
│ │ ├── ex_mem_reg.sv
│ │ ├── execute_stage.sv
│ │ ├── fetch_stage.sv
│ │ ├── forwarding_unit.sv
│ │ ├── hazard_detection_unit.sv
│ │ ├── id_ex_reg.sv
│ │ ├── if_id_reg.sv
│ │ ├── imm_generator.sv
│ │ ├── instruction_memory_fpga.sv
│ │ ├── ipu.sv
│ │ ├── main_control.sv
│ │ ├── mem_wb_reg.sv
│ │ ├── memory_stage.sv
│ │ ├── program_counter.sv
│ │ ├── register_file.sv
│ │ ├── rv32i_core.sv
│ │ └── writeback_stage.sv
│ └── testbench/
│ ├── memfile.hex
│ ├── memfile_ipu_system.hex
│ ├── memfile_pix.hex
│ ├── memfile_rv32i.hex
│ ├── tb_ipu_system.sv
│ ├── tb_rv32i_ipu_custom.sv
│ └── tb_rv32i_top.sv
└── images/
└── (90+ images — diagrams, ASIC layouts, FPGA results, etc.)
This guide walks you through setting up your Linux environment and running the three main design flows:
- Simulation (Vivado xsim)
- FPGA (Basys-3 + OV7670)
- ASIC (OpenROAD Flow Scripts)
Follow the steps in order.
The simulation flow uses Vivado's built-in simulator (xsim) and sources RTL from the simulation/ directory.
| Tool | Purpose | Install Command |
|---|---|---|
| make | Build automation | sudo apt install -y make |
| build-essential | Compilers & development tools | sudo apt install -y build-essential |
| Xilinx Vivado | Simulation, Synthesis & FPGA | See §1.2 |
Vivado is required for both simulation and FPGA flows.
The free WebPACK edition supports Artix-7 devices.
- Create a free AMD/Xilinx account at https://www.xilinx.com
- Download the AMD Unified Installer for FPGAs & Adaptive SoCs
- Linux Self-Extracting Web Installer (~290 MB)
- Available from the Vivado download page.
sudo apt update
sudo apt install -y \
libtinfo-dev \
libncurses-dev \
libglib2.0-dev \
libgtk2.0-dev \
zlib1g \
python3-dev \
python3-pip \
default-jre \
default-jdk \
libswt-gtk-4-jni \
locales \
tar \
gzip \
gcc \
g++ \
make \
build-essential# Navigate to your download directory
chmod +x FPGAs_AdaptiveSoCs_Unified_2024.1_0522_2023_Lin64.bin
sh FPGAs_AdaptiveSoCs_Unified_2024.1_0522_2023_Lin64.binDuring installation:
- Select Vivado ML Standard Edition
- Select Vivado
- (Optional) Install Vitis
- Select Artix-7 devices only
- Choose an installation directory
Recommended:
/tools/Xilinx/
or
/opt/Xilinx/
Add the following line to your ~/.bashrc.
Adjust the version/path if necessary.
source /tools/Xilinx/Vivado/2024.1/settings64.shReload your shell.
source ~/.bashrcvivado -version
xvlog -version
xelab -version
xsim -versionIf you encounter
libtinfo.so.5: cannot open shared object file
run
sudo apt install libtinfo-dev
sudo ln -s \
/lib/x86_64-linux-gnu/libtinfo.so.6 \
/lib/x86_64-linux-gnu/libtinfo.so.5git clone https://github.com/aukhalid/evpix_rv32.git
cd evpix_rv32
# Create the build directory tree
make setupRun make setup only once after cloning.
make sim_coremake sim_ipumake sim_custommake sim_allLogs
build/sim/logs/
Waveforms (when WAVES=1)
build/sim/xsim.dir/
Memory HEX files are automatically copied from
simulation/testbench/
make sim_core WAVES=1This launches the Vivado xsim GUI.
The FPGA flow uses Vivado in batch mode and sources RTL from the fpga/ directory.
| Hardware | Details |
|---|---|
| Digilent Basys-3 | Xilinx Artix-7 FPGA (xc7a35tcpg236-1) |
| OV7670 Camera | With breakout board, connected via PMOD |
| VGA Monitor | Standard VGA |
| USB-A → Micro-B Cable | Programming & power |
Note: The free Vivado WebPACK license fully supports the Artix-7 on the Basys-3.
Run the complete flow:
make fpga_allOr execute each step individually.
make fpga_synth_onlyProduces
post_synth.dcp
make fpga_impl_onlyProduces
post_route.dcp
make fpga_bit_onlyCheckpoints
build/fpga/synth/
build/fpga/impl/
Bitstream
build/fpga/bitstream/evpix_rv32_top.bit
Reports
build/fpga/reports/
Includes:
- Timing
- Utilization
- Power
make fpga_programThis flashes the bitstream over JTAG.
- Connect the Basys-3 using USB
- Connect the OV7670 camera to the PMOD header
- Connect a VGA monitor
- Power on the board
- Use onboard switches to select operating mode
Constraint file:
fpga/constrains/evpix_basys3.xdc
make fpga_all \
FPGA_TOP=your_custom_top \
FPGA_PART=xc7a35tcpg236-1The ASIC flow uses OpenROAD Flow Scripts (ORFS) with either
- SKY130HD
- ASAP7
to perform complete RTL-to-GDSII physical implementation.
| Tool | Purpose | Install |
|---|---|---|
| git | Version Control | sudo apt install -y git |
| python3-venv python3-pip python3-yaml | Python tooling | sudo apt install -y python3-venv python3-pip python3-yaml |
| OpenROAD Flow Scripts | Complete RTL-to-GDSII Flow | See §3.2 |
| KLayout | GDS Viewer | Installed by ORFS |
Minimum
- 1 CPU core
- 8 GB RAM
Recommended
- 4+ CPU cores
- 16+ GB RAM
- 100+ GB free disk space
mkdir -p ~/Work/vlsi/tools
cd ~/Work/vlsi/tools
git clone --recursive \
https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts
cd OpenROAD-flow-scriptssudo ./setup.shThis installs
- CMake
- Boost
- Lemon
- SWIG
- Eigen
- Yosys dependencies
- KLayout
./build_openroad.sh --localCompilation typically takes
20–60 minutes
depending on hardware.
Add to your ~/.bashrc
source ~/Work/vlsi/tools/OpenROAD-flow-scripts/env.shReload
source ~/.bashrcyosys -help
openroad -helpcd ~/Work/vlsi/tools/OpenROAD-flow-scripts/flow
make
make gui_finalThis launches the OpenROAD GUI.
The Makefile assumes
~/OpenROAD-flow-scripts
Override with
ORFS_DIR=
if necessary.
make asic_setup_checkmake asic_installmake asic_synth_onlymake asic_allTypically takes
20–60 minutes
make asic_asap7Logs
build/asic/logs/
Reports
build/asic/reports/qor_summary.txt
GDSII
build/asic/gds/
Automatically copied using
asic_gds_copy
make asic_viewOpens the GDSII in KLayout.
make asic_reportPrints
- Area
- Timing
- Power
make asic_killmake asic_all \
ORFS_DIR=/custom/path \
PLATFORM=sky130hdRemove simulation artifacts
make cleanRemove simulation, FPGA and ASIC artifacts
make clean_allRemove the entire build directory
make distclean| Target | Description |
|---|---|
make help |
Show all available targets |
make setup |
Create build directory tree |
make sim_core |
Run RV32I simulation |
make sim_ipu |
Run IPU simulation |
make sim_custom |
Run Custom ISA simulation |
make sim_all |
Run all simulations |
make fpga_synth_only |
FPGA synthesis |
make fpga_impl_only |
FPGA implementation |
make fpga_bit_only |
FPGA bitstream |
make fpga_all |
Complete FPGA flow |
make fpga_program |
Program Basys-3 |
make asic_setup_check |
Verify ORFS installation |
make asic_install |
Install project into ORFS |
make asic_synth_only |
ASIC synthesis |
make asic_all |
Complete ASIC flow |
make asic_asap7 |
ASAP7 flow |
make asic_report |
QoR summary |
make asic_view |
View layout |
make asic_kill |
Stop running flow |
make helpbuild/sim/logs/
build/fpga/reports/
build/asic/logs/
For the latest ASIC error
make asic_last_error- Compiler Support — GCC/LLVM backend with custom instruction intrinsics
- DMA Engine — Lightweight DMA for autonomous data movement
- Higher Resolution — QVGA (320×240) and VGA (640×480) support with external memory
- Advanced IPU Kernels — Morphological operations, histogram equalization, motion estimation
- Native TinyML — Port finger-counting model to run entirely on EVPIX-RV32 CPU/IPU
- Formal Verification — SVA assertions and model checking for pipeline correctness
- Advanced Node ASIC — Migration to 65nm or 28nm for area/power reduction
- Multi-Core Extension — Dual-core heterogeneous configuration
If you use EVPIX-RV32 in your research, please cite:
@thesis{khalid2026evpix,
author = {Khalid, Ahasan Ullah},
title = {{EVPIX-RV32: 5-Stage Custom RISC-V SoC with Integrated IPU and TinyML Support for Real-Time Edge-Vision AI Acceleration}},
school = {Chittagong University of Engineering and Technology},
year = {2026},
type = {Bachelor's Thesis},
department = {Electronics and Telecommunication Engineering},
address = {Chattogram-4349, Bangladesh}
}- Department: Department of Electronics & Telecommunication Engineering
- Supervisor: Md. Farhad Hossain, Assistant Professor, ETE Department, CUET
- Specal Thanks: Arif Istiaque, Assistant Professor, ETE Department, CUET
- Institution: Chittagong University of Engineering & Technology (CUET)
- Open-Source Community: RISC-V International, OpenROAD Project, SkyWater PDK, YosysHQ
- Tools: Xilinx Vivado, Digilent Basys-3 FPGA, OpenROAD-Flow-Script, Magic, KLayout
This project is licensed under the Apache 2.0 License — see the LICENSE file for details.
The RISC-V ISA is an open standard maintained by RISC-V International. The SkyWater 130nm PDK is provided under the Apache 2.0 license by Google and SkyWater Technology.
Built with ❤️ at CUET | Open Source | Open Silicon | Open Education




































