Skip to content

AICL-Lab/tiny-dl-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

52 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Tiny-DL-Inference

npm version Bundle Size CI Status Tests WebGPU TypeScript Zero Dependencies License

Tiny-DL-Inference Logo

A High-Performance WebGPU Deep Learning Inference Engine

โšก Zero Dependencies ยท ๐Ÿ”ฅ Kernel Fusion ยท ๐ŸŽจ Hand-Written WGSL ยท ๐Ÿš€ Zero-Copy

Quick Start ยท Performance ยท Documentation ยท Playground ยท Contributing

English | ็ฎ€ไฝ“ไธญๆ–‡


Why Tiny-DL-Inference?

The smallest, most transparent deep learning inference engine for the web.

Tiny-DL-Inference TensorFlow.js ONNX Runtime Web
Bundle Size 58KB ~2MB ~1.5MB
Dependencies Zero Heavy Moderate
Code Transparency 100% WGSL source Black box Black box
GPU Control Direct shader access Abstracted Abstracted
Kernel Fusion โœ… Manual fusion Limited Limited

Built for developers who want full control, minimal overhead, and maximum understanding of GPU-based neural network inference.


Features

๐Ÿš€ Performance

  • Zero Dependencies โ€” No TensorFlow.js or ONNX Runtime. Pure WebGPU with minimal footprint
  • Kernel Fusion โ€” Fused Conv2d+Bias+ReLU achieves 3ร— memory bandwidth reduction
  • Zero-Copy Operations โ€” Tensor views with no GPU overhead (< 1ฮผs reshape)
  • Hand-Written WGSL โ€” Every operator implemented from scratch in readable WGSL code

๐Ÿ›  Developer Experience

  • Type Safe โ€” Full TypeScript with strict mode, zero any types
  • Comprehensive Testing โ€” Property-based testing with fast-check (100+ iterations each)
  • Production Ready โ€” Custom error classes, proper GPU resource lifecycle
  • Educational โ€” Perfect for studying GPU computing and WebGPU programming

Quick Start

Requirements

  • Browser: Chrome 113+ / Edge 113+ / Safari 18+ (with WebGPU enabled)
  • Hardware: GPU with WebGPU support (discrete GPU recommended for best performance)
  • Node.js: 18.0+ (for development)

Installation

npm install tiny-dl-inference

๐Ÿš€ Try it Online

Open in StackBlitz

First Inference

import { GPUContext, Tensor, ReLUOperator } from 'tiny-dl-inference';

// 1. Initialize GPU context
const context = new GPUContext();
await context.init();

// 2. Create input tensor
const input = Tensor.fromArray(context, 
  new Float32Array([1.0, -2.0, 3.0, -4.0]),
  [1, 4, 1, 1]  // [batch, channels, height, width]
);

// 3. Run ReLU activation
const relu = new ReLUOperator(context);
const output = await relu.forward([input]);

// 4. Get results
const result = await output.download();
console.log(result); // Float32Array([1, 0, 3, 0])

// 5. Cleanup resources
input.destroy();
output.destroy();
context.destroy();

Using InferenceEngine (High-Level API)

import { InferenceEngine, ModelLoader } from 'tiny-dl-inference';

// Initialize engine
const context = new GPUContext();
await context.init();

const engine = new InferenceEngine(context);

// Load model from JSON
await engine.loadModel('https://example.com/mnist-model.json');

// Prepare input (MNIST: 1x1x28x28)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);

// Run inference
const output = await engine.infer(input);
const predictions = await output.download();

// Get predicted class
const predictedClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted digit:', predictedClass);

// Cleanup
input.destroy();
output.destroy();
engine.dispose();
context.destroy();

โ†’ Read the Full Documentation for detailed guides and examples.


Performance

Kernel Fusion: 3ร— Memory Bandwidth Reduction

Without Fusion (6 memory operations):
  Read โ†’ Conv โ†’ Write โ†’ Read โ†’ Bias โ†’ Write โ†’ Read โ†’ ReLU โ†’ Write

With Fusion (2 memory operations):
  Read โ†’ Conv+Bias+ReLU โ†’ Write
Benchmark Separate Operators Fused Operator Improvement
Conv2d 64-channel 2.34ms 0.89ms 2.6ร— faster
Memory Operations 6 ops 2 ops 3ร— reduction
Kernel Launches 3 1 66% fewer
Intermediate Tensors 3 allocated 0 100% saved

Zero-Copy Reshape

// Zero GPU overhead - creates a view, not a copy
const flat = tensor.reshape([1, 2352]);  // < 1 microsecond

First Inference Latency

Model Latency Device
MNIST CNN < 100ms Chrome 120, RTX 3060
CIFAR-10 < 150ms Chrome 120, RTX 3060

Supported Operators

Convolution

Operator Description Fusion Available
Conv2d 2D Convolution with stride/padding โœ… Fused with Bias+ReLU
Conv2dBiasReLU Conv + Bias + ReLU in single kernel โœ… 3ร— memory reduction

Pooling

Operator Description
MaxPool 2D Max Pooling with configurable kernel size

Activation Functions

Operator Description Formula
ReLU Rectified Linear Unit f(x) = max(0, x)
Softmax Normalized exponential (numerically stable) f(x_i) = e^(x_i) / ฮฃe^(x_j)

Fully Connected

Operator Description
Dense Fully connected layer with optional bias
Flatten Zero-copy tensor reshaping

Complete Example: MNIST Classification

import { GPUContext, Tensor, InferenceEngine } from 'tiny-dl-inference';

async function classifyMNIST(imageData: Float32Array): Promise<number> {
  const context = new GPUContext();
  
  try {
    await context.init();
    const engine = new InferenceEngine(context);
    await engine.loadModel('mnist-model.json');
    
    // Input: 1x1x28x28 (grayscale MNIST)
    const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
    
    // Run inference
    const output = await engine.infer(input);
    const predictions = await output.download();
    
    // Get result
    const predictedDigit = predictions.indexOf(Math.max(...predictions));
    
    // Cleanup
    input.destroy();
    output.destroy();
    engine.dispose();
    
    return predictedDigit;
  } finally {
    // Ensure GPU resources are released even if an error occurs
    context.destroy();
  }
}

// Usage
const imageData = new Float32Array(784); // 28x28 pixel data
classifyMNIST(imageData)
  .then(digit => console.log('Recognized digit:', digit))
  .catch(err => console.error('Inference failed:', err));

โ†’ See more Examples including custom models, web integration, and performance benchmarking.


Browser Compatibility

Browser Minimum Version Status
Chrome 113+ โœ… Fully Supported
Edge 113+ โœ… Fully Supported
Safari 18+ (macOS Sonoma+) โš ๏ธ Experimental
Firefox Behind flag ๐Ÿ”ง Enable dom.webgpu.enabled

Check WebGPU Support

if (navigator.gpu) {
  console.log('โœ… WebGPU is supported!');
} else {
  console.error('โŒ WebGPU not supported in this browser');
}

Project Structure

Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Application Layer                        โ”‚
โ”‚              (InferenceEngine, ModelLoader)                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Operator Layer                           โ”‚
โ”‚    (Conv2d, ReLU, MaxPool, Dense, Softmax, etc.)            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Core Layer                             โ”‚
โ”‚         (GPUContext, Tensor, Memory Management)             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    WebGPU Runtime                           โ”‚
โ”‚              (WGSL Shaders, GPU Compute)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Directory Layout

tiny-dl-inference/
โ”œโ”€โ”€ docs/               # User documentation (Bilingual)
โ”‚   โ”œโ”€โ”€ en/             # English (26 files)
โ”‚   โ””โ”€โ”€ zh/             # ไธญๆ–‡ (27 files)
โ”œโ”€โ”€ src/                # Source code
โ”‚   โ”œโ”€โ”€ core/           # GPUContext, Tensor, error classes
โ”‚   โ”œโ”€โ”€ operators/      # Neural network operators
โ”‚   โ”œโ”€โ”€ engine/         # InferenceEngine, ModelLoader
โ”‚   โ””โ”€โ”€ utils/          # Benchmark, CPU reference implementations
โ”œโ”€โ”€ tests/              # Test suite (Vitest)
โ””โ”€โ”€ examples/           # Demo code (MNIST, benchmark)

Development

Setup

# Clone repository
git clone https://github.com/LessUp/tiny-dl-inference.git
cd tiny-dl-inference

# Install dependencies
npm install

# Run type checking
npm run typecheck

# Run tests (134 passing)
npm test

# Build project
npm run build

Testing

# Run all tests
npm test

# Run with coverage report
npm run test:coverage

# Run specific test file
npx vitest run tests/operators/Conv2dOperator.test.ts

# Property-based tests (100+ iterations each)
npx vitest run -t "property"

Test Coverage:

  • โœ… 134 tests passing
  • โœ… 13 property-based tests with fast-check
  • โœ… CPU reference implementations for correctness validation
  • โœ… Target: >90% code coverage (V8)

Documentation

๐Ÿ“š Getting Started

๐Ÿ”ง Core Concepts

๐Ÿš€ Advanced

๐Ÿ“– API Reference

๐Ÿ’ก Examples

๐Ÿงช Playground


ไธญๆ–‡ๆ–‡ๆกฃ

โ†’ Browse Full Documentation: English | ไธญๆ–‡


Contributing

We welcome focused, maintainable contributions. Keep changes small, test-backed, and aligned with the existing TypeScript/WebGPU architecture.

Quick Start

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Implement your change directly in code
  4. Test thoroughly
  5. Update docs if public behavior changes
  6. Submit a Pull Request

Resources

Code Style

  • TypeScript strict mode (strict: true)
  • 2-space indentation, single quotes
  • Property-based testing with fast-check
  • Follow existing patterns in /src/operators/

Changelog

See CHANGELOG.md for all releases.

Latest: v2.0.1 (2026-04-16)

Security:

  • Fixed 5 moderate npm vulnerabilities
  • Updated vitest to v4.1.4

Performance:

  • Kernel fusion: 3ร— memory reduction
  • Zero-copy reshape: < 1ฮผs overhead
  • GPU memory leak fixes

โ†’ Full Changelog


License

MIT License โ€” Free for personal and commercial use.


Links


Built with โค๏ธ for the AI community