Skip to content

ab-bhorania/split-weight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

split-weight

Split a single neural network into multiple smaller networks using weight splitting.

⚠️ Sparsity vs. Overhead. ⛔️ [DEPRECATED]

Even if the total number of operations (FLOPs) decreases, splitting a single large matrix multiplication (matmul) into several smaller ones often results in a slowdown on modern hardware like GPUs. This is because large matmuls are highly optimized for parallel processing; multiple small calls introduce "kernel launch overhead" and prevent the hardware from reaching peak throughput.

Overview

This project explores an approach to improve inference efficiency in neural networks by decomposing a large model into smaller sub-networks based on weight significance.

Idea

In many neural network tasks, not all inputs strongly influence all outputs. When certain weights are close to zero, their contribution to the final output becomes negligible.

The core idea is:

  • Identify weights that have minimal impact (near zero values)
  • Split the network into smaller sub-networks by grouping significant weights
  • Reduce unnecessary computation during inference by ignoring weak connections

This can lead to more efficient output generation in trained models, especially in scenarios where sparsity naturally emerges.

Visualization

Original Weights

Weights

Split Weights

Splitting Weights

How It Works

  1. Train a standard neural network
  2. Analyze the learned weights
  3. Identify near-zero weights (low importance connections)
  4. Partition the network into smaller sub-networks
  5. Use these sub-networks independently or selectively during inference

Benefits

  • Reduced computation during inference
  • Potential speed improvements
  • Better utilization of sparsity in trained models
  • Modular network structure

Use Cases

  • Edge devices with limited compute
  • Real-time inference systems
  • Sparse neural network optimization