High-performance, zero-allocation core library for the Nen ecosystem
Nen Core is a foundational library that provides optimized primitives, utilities, and data structures for building high-performance applications in Zig. Built with Data-Oriented Design (DOD) principles and optimized for ReleaseFast builds, it delivers exceptional performance across all critical paths.
| Operation | Performance | Optimization |
|---|---|---|
| SIMD Operations | 9,330M ops/sec | ReleaseFast + Inline |
| Stack Allocators | 125M allocations/sec | Zero heap allocations |
| Fixed Allocators | 937M operations/sec | Zero-allocation patterns |
| Memory Pools | 77M operations/sec | High-performance block reuse |
| Fast Math | 1,511M ops/sec | Inline approximations |
| RNG Operations | 370M ops/sec | Inline generation |
- SIMD-optimized operations for vector mathematics
- Fast math approximations using bit manipulation
- Zero-allocation patterns for maximum efficiency
- ReleaseFast optimizations with 2-28x performance improvements
- Stack-backed allocators for ultra-fast temporary allocations
- Fixed-size allocators for compile-time known sizes
- Memory pools with high-performance block reuse
- Batch allocators with automatic heap fallback
- Cache-friendly memory layouts (Struct of Arrays)
- Batch processing for maximum throughput
- SIMD-optimized vector operations
- Zero-copy data structures
- High-performance batch processing with zero-allocation
- Atomic batch commits for data consistency
- Client-side automatic batching to reduce overhead
- Pre-allocated message buffers for predictable performance
- Consolidated data types across all Nen projects
- Unified error handling and version management
- Common constants and configuration
- Zero dependencies (except Zig toolchain)
- Zig 0.15.1 or later
- No external dependencies required
Add this to your build.zig:
const nen_core = b.dependency("nen-core", .{
.target = target,
.optimize = optimize,
});
exe.addModule("nen-core", nen_core.module("nen-core"));Or add as a submodule:
git submodule add https://github.com/your-org/nen-core.gitconst std = @import("std");
const nen_core = @import("nen-core");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// High-performance stack allocator
var stack_arena = try nen_core.StackArena.init(allocator, 1024 * 1024);
defer stack_arena.deinit();
// Allocate temporary data
const data = try stack_arena.alloc(u8, 1000);
// SIMD-optimized vector operations
const a = [_]f32{ 1.0, 2.0, 3.0, 4.0 };
const b = [_]f32{ 2.0, 3.0, 4.0, 5.0 };
var result = [_]f32{ 0.0, 0.0, 0.0, 0.0 };
nen_core.SIMDOperations.addVectors(&a, &b, &result);
// Fast math approximations
const fast_exp = nen_core.FastMath.fastExp(2.0);
const fast_ln = nen_core.FastMath.fastLn(2.0);
// High-performance RNG
var rng = nen_core.XorShift32.init(42);
const random_value = rng.next();
std.debug.print("Result: {any}\n", .{result});
std.debug.print("Fast exp(2): {d}\n", .{fast_exp});
std.debug.print("Random: {}\n", .{random_value});
}const std = @import("std");
const nen_core = @import("nen-core");
pub fn batchExample() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Create batch processor
var processor = try nen_core.BatchProcessor.init(allocator);
defer processor.deinit();
// Add operations to batch
try processor.addOperation(.data_write, "Hello, World!");
try processor.addOperation(.data_read, "user:123");
try processor.addOperation(.data_delete, "temp:456");
// Execute batch atomically
const result = try processor.executeBatch();
if (result.err) |err| {
std.debug.print("Batch failed: {}\n", .{err});
} else {
std.debug.print("Batch executed: {} operations\n", .{result.processed});
}
}const std = @import("std");
const nen_core = @import("nen-core");
pub fn memoryExample() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Fixed-size stack allocator (zero-allocation)
var fixed_alloc = nen_core.FixedStackAllocator(64 * 1024){};
// Allocate different types
const u8_data = try fixed_alloc.alloc(u8, 100);
const f32_data = try fixed_alloc.alloc(f32, 50);
const u64_data = try fixed_alloc.alloc(u64, 25);
// Use the data...
u8_data[0] = 42;
f32_data[0] = 3.14;
u64_data[0] = 12345;
// Reset for reuse (ultra-fast)
fixed_alloc.reset();
// Memory pool for high-frequency allocations
var pool = try nen_core.StackMemoryPool.init(allocator, 256, 1000);
defer pool.deinit();
// Allocate blocks from pool
const block1 = try pool.allocBlock();
const block2 = try pool.allocBlock();
// Use blocks...
block1[0] = 0xFF;
block2[0] = 0xAA;
// Free blocks back to pool
pool.freeBlock(block1);
pool.freeBlock(block2);
}StackArena- Ultra-fast stack-backed allocationsFixedStackAllocator(comptime size)- Zero-allocation for known sizesBatchAllocator- Stack-first with heap fallbackStackMemoryPool- High-performance block allocationStringAllocator- Zero-copy string management
SIMDOperations.addScalar()- Add scalar to vectorSIMDOperations.multiplyScalar()- Multiply vector by scalarSIMDOperations.addVectors()- Element-wise vector additionSIMDOperations.dotProduct()- Vector dot productSIMDOperations.sum()- Sum all elements
FastMath.fastExp()- Fast exponential approximationFastMath.fastLn()- Fast natural logarithmFastMath.fastSqrt()- Fast square rootFastMath.fastSin()- Fast sine approximationFastMath.fastCos()- Fast cosine approximation
XorShift32- High-performance 32-bit RNGSplitMix64- High-quality 64-bit RNGPCG32- Permuted Congruential Generator
BatchProcessor- TigerBeetle-style batch processingBatchAPI- High-level batch interfaceClientBatcher- Automatic client-side batching
DataType- Unified data type definitionsShape- Tensor shape managementBackend- Compute backend typesQuantizationType- Quantization schemes
# Maximum performance (production)
zig build -Doptimize=ReleaseFast
# Balanced performance and safety
zig build -Doptimize=ReleaseSafe
# Development with debug info
zig build -Doptimize=DebugAll hot-path functions are strategically inlined for maximum performance:
- SIMD operations in tight loops
- Fast math approximations
- Memory allocation/deallocation
- RNG operations
- Metrics recording
# Numerical computing demo
zig build run-numerical
# Memory management demo
zig build run-memory
# Unified data types demo
zig build run-unified
# TigerBeetle batching demo
zig build run-batching
# Inline performance demo
zig build run-inline
# Release performance demo
zig build run-release# Run all tests
zig build test
# Run with ReleaseFast optimization
zig build -Doptimize=ReleaseFast test
# Run performance benchmarks
zig build -Doptimize=ReleaseFast run-inline
zig build -Doptimize=ReleaseFast run-release-
Data-Oriented Design (DOD)
- Cache-friendly memory layouts
- Batch processing for maximum throughput
- Zero-allocation patterns where possible
-
Zero Dependencies
- Only depends on Zig standard library
- No external C libraries or dependencies
- Self-contained and portable
-
Performance First
- Strategic inlining of hot-path functions
- ReleaseFast optimizations
- Advanced allocator patterns
-
Ecosystem Foundation
- Consolidates common functionality
- Eliminates code duplication
- Provides unified interfaces
src/
βββ lib.zig # Main library file
βββ memory.zig # Memory management
βββ simd.zig # SIMD operations
βββ math.zig # Fast math functions
βββ rng.zig # Random number generation
βββ batching.zig # TigerBeetle-style batching
βββ advanced_allocators.zig # Advanced allocator patterns
βββ data_types.zig # Unified data types
βββ version.zig # Version management
βββ unified_errors.zig # Error handling
βββ unified_constants.zig # Common constants
βββ assertions.zig # Assertion utilities
βββ constants.zig # DOD constants
βββ errors.zig # Error definitions
βββ layouts.zig # Memory layouts
βββ metrics.zig # Performance metrics
| Operation | Debug | ReleaseFast | Speedup |
|---|---|---|---|
| SIMD Operations | 335M ops/sec | 9,330M ops/sec | 28x |
| Fast Math | 204M ops/sec | 1,511M ops/sec | 7.4x |
| RNG Operations | 126M ops/sec | 370M ops/sec | 2.9x |
| Memory Management | 50M ops/sec | 490M ops/sec | 9.8x |
| Allocator | Performance | Use Case |
|---|---|---|
| Stack Arena | 125M allocations/sec | Temporary data |
| Fixed Stack | 937M operations/sec | Known sizes |
| Batch Allocator | 21M allocations/sec | Mixed patterns |
| Memory Pool | 77M operations/sec | Block reuse |
| Heap Allocator | 66M allocations/sec | General purpose |
We welcome contributions! Please see our Contributing Guidelines for details.
# Clone the repository
git clone https://github.com/your-org/nen-core.git
cd nen-core
# Run tests
zig build test
# Run examples
zig build run-numerical
zig build run-memory
zig build run-batching
# Run performance benchmarks
zig build -Doptimize=ReleaseFast run-inline- Follow Zig's
snake_casenaming conventions - Use descriptive variable names (no
_for unused variables) - Keep functions under 70 lines
- Add assertions for function arguments and return values
- Use
zig fmtfor formatting - Maintain 100-column line limit
This project is licensed under the MIT License - see the LICENSE file for details.
- TigerBeetle - Inspiration for high-performance batching patterns
- Zig Community - For the excellent language and ecosystem
- Data-Oriented Design - For performance optimization principles
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: API Docs
Built with β€οΈ for high-performance computing in Zig