NCTU IEE 2016 Fall
Computer Architecture Final Project

Part-I: Use CUDA to accelerate the operations of a typical convolutional layer in often-used large-scale neural networks. (You can find the description slides here)
Part-II: Accelerate a sparse convolutional layer with CUDA. (You can find the description slides here)

Three sub-directory

./data

This directory contains the input data for the base program

/data/filt.txt - Store the values of filters
/data/filt.coo - Store the values of filters in COO format
/data/inNeu.txt - Store the values of input neurons
/data/inNeu.coo - Store the values of input neurons in COO format

./innerProduct

This is the example to show you how to use CUDA to accelerate Inner Product

Usage

cd ./innerProduct
make
make run

./device

The program under this directory can show the device information

Usage

cd ./device
make
make run

Usage of the base program

Get the code and data for part-II into a new branch

git checkout -t origin/part2

Compile the code

make

Run the code

make run

Task

Put the input data in sparse format and reimplement your CUDA kernels
Use NVIDIA Visual Profiler to analyze and improve your code
Optimize your CUDA kernels for the sparse format
Improve the input data format (like using other sparse format rather than COO)

Evaluation

convLayerCPU() will do the computation with C++ and store the output in the outCPU
checker() will check whether the values stored in outCPU and outGPU are the same
- Store your result in the outGPU in dense format
- You must pass the checking to ensure your result is correct!
Use nvvp (or nvprof) to measure the kernel execution time and data transfer time

TA will use TotalExecTime to evaluate your preformance

  DataTransTime = DataHostToDeviceTime + DataDeviceToHostTime
  TotalExecTime = GPUKernelsExecTime + DataTransTime

Grading Policy

Completeness (30%)
- Your result is correct (Pass the check) - 5%
- You get speedup compared to convLayerCPU() - 5%
- You use NVIDIA Visual Profiler (NVVP) to help you - 5%
- You utilize the sparsity in either Neurons or Filters - 5%
- Improve the input data format (like using other sparse format rather than COO) - 10%
Performance Ranking (30%)
- TA will rank your TotalExecTime on the provided server
- The fastest one will get 30% and the last one will get 1%
Report (40%)
- Description of your implementation and results - 5%
- Show how NVVP help you find and solve perf. issues - 5%
- Discussion on your optimizations and innovations - 20%
- Comparison between part-I - 5%
- Feedback of this project - 5%

Other Rules

It’s team work, 1 ~ 3 people in one team
- Same team members as part-I
Compress your code and report into one zip file and upload to E3
- Name your package as: LeaderID_FP2.zip
- One team only need to upload one package to E3
- Please name your report as: LeaderID_Report_FP2.pdf
- Make sure TA can compile and run your code on the provided server
Any CUDA library is forbidden to use in this project
Delay is NOT acceptable
Any plagiarism will make you get zero point

Useful Reference

Part-I

LeNet: Gradient Based Learning Applied to Document Recognition
AlexNet: ImageNet Classification with Deep Convolutional Neural Networks
CNN: Standford CS231n Convolutional Neural Networks for Visual Recognition
CUDA Tutorial: CUDA C/C++ Basics
CNN with CUDA: Optimizing Convolution Operations in CUDA with Adaptive Tiling convolution on gpu
GPU Profiling: GPU Performance Analysis and Optimisation
GPU Profiling: CUDA Profiling Documentation

Part-II

Network pruning: Learning both Weights and Connections for Efficient Neural Networks
Sparsity in Neurons: Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing
Sparse data GPU: Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors
Sparse data with CUDA: Efficient Sparse Matrix-Vector Multiplication on CUDA

TA: Chien-Yu Lin
Email: myislin@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
device		device
innerProduct		innerProduct
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cnnConvLayer		cnnConvLayer
cnnConvLayer.cu		cnnConvLayer.cu
cnnConvLayer.h		cnnConvLayer.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NCTU IEE 2016 Fall
Computer Architecture Final Project

Three sub-directory

./data

./innerProduct

Usage

./device

Usage

Usage of the base program

Get the code and data for part-II into a new branch

Compile the code

Run the code

Task

Evaluation

Grading Policy

Other Rules

Useful Reference

Part-I

Part-II

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NCTU IEE 2016 Fall Computer Architecture Final Project

Three sub-directory

./data

./innerProduct

Usage

./device

Usage

Usage of the base program

Get the code and data for part-II into a new branch

Compile the code

Run the code

Task

Evaluation

Grading Policy

Other Rules

Useful Reference

Part-I

Part-II

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

NCTU IEE 2016 Fall
Computer Architecture Final Project

Packages