This guide covers performance benchmarking and flamegraph profiling for TundraDB.
TundraDB includes comprehensive benchmark tests for measuring performance across different graph operations:
- Node Creation: Bulk node insertion performance
- Simple Joins: User→Company relationship queries
- Complex Joins: 3-way User→Friend→Company queries
- Full Scans: Table scan performance
- Filtered Queries: WHERE clause performance with indexes
- macOS with Xcode Command Line Tools
- CMake and Make
- Python 3
- FlameGraph tools (automatically downloaded)
cd /path/to/tundradb
mkdir -p build && cd build
cmake ..
make benchmark_test# Run all benchmarks
./tests/benchmark_test
# Run specific benchmark
./tests/benchmark_test --filter=BM_SimpleJoin
# Run with custom timing
./tests/benchmark_test --filter=BM_SimpleJoin --benchmark_min_time=10s# Download FlameGraph tools (only needed once)
cd /tmp && git clone https://github.com/brendangregg/FlameGraph.gitcd build
# Profile a specific benchmark (5 second sample)
./tests/benchmark_test --filter=BM_SimpleJoin --benchmark_repetitions=1 --benchmark_min_time=5s &
PID=$!; sleep 1; sample $PID 5 -f profile_sample.txt; waitcd ..
python3 parse_sample.py build/profile_sample.txt | /tmp/FlameGraph/flamegraph.pl > build/flamegraph.svgopen build/flamegraph.svg| Filter | Description | What It Measures |
|---|---|---|
BM_NodeCreation |
Node insertion performance | Node creation throughput |
BM_FullScan |
Table scan performance | Sequential scan speed |
BM_SimpleJoin |
User→Company joins | Basic relationship queries |
BM_ComplexJoin |
3-way joins | Complex graph traversals |
BM_FilteredQuery |
WHERE clause queries | Index performance |
Benchmarks run on three dataset sizes:
- Small: 100 users, 20 companies, 50 products
- Medium: 5,000 users, 500 companies, 1,000 products
- Large: 50,000 users, 5,000 companies, 10,000 products
| Operation | Throughput | Notes |
|---|---|---|
| Node Creation | ~90K nodes/sec | Bulk insertion |
| Full Scan | ~25K nodes/sec | Sequential access |
| Simple Join | ~15K queries/sec | 2-table joins |
| Complex Join | ~7K queries/sec | 3-way joins |
| Filtered Query | ~50K queries/sec | Indexed lookups |
./tests/benchmark_test --filter=BM_NodeCreation --benchmark_min_time=5s &
PID=$!; sample $PID 5 -f node_creation_profile.txt; wait
python3 parse_sample.py build/node_creation_profile.txt | /tmp/FlameGraph/flamegraph.pl > build/node_creation_flamegraph.svg
open build/node_creation_flamegraph.svg./tests/benchmark_test --filter=BM_ComplexJoin --benchmark_min_time=10s &
PID=$!; sample $PID 8 -f complex_join_profile.txt; wait
python3 parse_sample.py build/complex_join_profile.txt | /tmp/FlameGraph/flamegraph.pl > build/complex_join_flamegraph.svg
open build/complex_join_flamegraph.svg- Width = Time: Wider blocks = more CPU time spent
- Height = Call Stack: Bottom = entry points, Top = leaf functions
- Click to Zoom: Focus on specific function call paths
- Hover for Details: See exact function names and sample counts
- Wide blocks at the bottom: Core database operations
- Tall stacks: Deep function call chains (potential optimization targets)
- Fragmented areas: Many small functions (potential consolidation opportunities)
- Color coding: Different colors help distinguish call paths
# Make sure you're in the build directory
cd build
ls -la tests/benchmark_test # Should exist and be executable# Check if profile data was captured
ls -la profile_sample.txt
head -20 profile_sample.txt # Should contain call graph data# Increase profiling duration
sample $PID 10 -f profile_sample.txt # 10 seconds instead of 5| File | Description |
|---|---|
profile_sample.txt |
Raw macOS sample profiler output |
flamegraph.svg |
Interactive flamegraph visualization |
parse_sample.py |
Custom macOS sample format parser |
Create profile.sh for easy profiling:
#!/bin/bash
BENCHMARK=${1:-BM_SimpleJoin}
DURATION=${2:-5}
echo "Profiling $BENCHMARK for ${DURATION}s..."
cd build
.tests/benchmark_test --filter=$BENCHMARK --benchmark_min_time=${DURATION}s &
PID=$!; sleep 1; sample $PID $DURATION -f profile_${BENCHMARK}.txt; wait
echo "Generating flamegraph..."
cd ..
python3 parse_sample.py build/profile_${BENCHMARK}.txt | /tmp/FlameGraph/flamegraph.pl > build/flamegraph_${BENCHMARK}.svg
echo "Opening flamegraph..."
open build/flamegraph_${BENCHMARK}.svg
echo "Profiling complete! Flamegraph: build/flamegraph_${BENCHMARK}.svg"Usage:
chmod +x profile.sh
./profile.sh BM_ComplexJoin 10 # Profile complex joins for 10 seconds- Identify Hotspots: Look for wide blocks in flamegraphs
- Reduce Call Depth: Tall stacks indicate potential inlining opportunities
- Memory Access Patterns: Sequential access shows up as smooth blocks
- Lock Contention: Scattered patterns may indicate synchronization issues
- I/O Operations: Look for system call patterns in the graph
The benchmark tests use Google Benchmark framework with these key features:
- Automatic timing: Runs until statistically significant
- Multiple iterations: Averages results across runs
- Custom counters: Reports operations per second
- Memory usage: Tracks allocations (when enabled)
- JSON output: Machine-readable results for CI/CD
For detailed Google Benchmark options:
./tests/benchmark_test --help