Skip to content

Commit e4f9505

Browse files
Merge pull request #1 from randomizedcoder/init
Init
2 parents 457e27f + 2e38f89 commit e4f9505

37 files changed

Lines changed: 5935 additions & 1 deletion

.github/workflows/ci.yml

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main, master]
6+
pull_request:
7+
branches: [main, master]
8+
9+
jobs:
10+
test:
11+
strategy:
12+
matrix:
13+
go-version: ['1.21', '1.22', '1.23']
14+
os: [ubuntu-latest, macos-latest]
15+
16+
runs-on: ${{ matrix.os }}
17+
18+
steps:
19+
- uses: actions/checkout@v4
20+
21+
- name: Set up Go
22+
uses: actions/setup-go@v5
23+
with:
24+
go-version: ${{ matrix.go-version }}
25+
26+
- name: Build
27+
run: go build ./...
28+
29+
- name: Test
30+
run: go test ./...
31+
32+
- name: Test with Race Detector
33+
run: go test -race ./...
34+
35+
- name: Benchmark (sanity check)
36+
run: go test -bench=. -benchtime=100ms ./internal/...
37+
38+
lint:
39+
runs-on: ubuntu-latest
40+
steps:
41+
- uses: actions/checkout@v4
42+
43+
- name: Set up Go
44+
uses: actions/setup-go@v5
45+
with:
46+
go-version: '1.23'
47+
48+
- name: Run golangci-lint
49+
uses: golangci/golangci-lint-action@v4
50+
with:
51+
version: latest
52+
53+
benchmark:
54+
runs-on: ubuntu-latest
55+
needs: test
56+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
57+
58+
steps:
59+
- uses: actions/checkout@v4
60+
61+
- name: Set up Go
62+
uses: actions/setup-go@v5
63+
with:
64+
go-version: '1.23'
65+
66+
- name: Run Benchmarks
67+
run: |
68+
go test -bench=. -count=5 -benchmem ./internal/... | tee benchmark_results.txt
69+
70+
- name: Upload Benchmark Results
71+
uses: actions/upload-artifact@v4
72+
with:
73+
name: benchmark-results
74+
path: benchmark_results.txt

BENCHMARKING.md

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
# Benchmarking Guide
2+
3+
This document provides guidance for running and interpreting benchmarks.
4+
5+
## Quick Start
6+
7+
```bash
8+
# Run all benchmarks
9+
make bench
10+
11+
# Run with multiple iterations for variance analysis
12+
make bench-count
13+
14+
# Run specific package
15+
go test -bench=. -benchmem ./internal/cancel
16+
```
17+
18+
## Environment Setup
19+
20+
### Linux (Recommended)
21+
22+
For consistent, reproducible results:
23+
24+
```bash
25+
# 1. Set CPU governor to performance (prevents frequency scaling)
26+
sudo cpupower frequency-set -g performance
27+
28+
# 2. Disable turbo boost (for consistent clock speed)
29+
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
30+
31+
# 3. Verify CPU frequency is stable
32+
watch -n1 "cat /proc/cpuinfo | grep MHz | head -4"
33+
34+
# 4. Check for background processes
35+
top -bn1 | head -20
36+
```
37+
38+
### GOMAXPROCS
39+
40+
Control how many OS threads execute Go code:
41+
42+
```bash
43+
# Single-threaded execution (lowest variance, no goroutine scheduling noise)
44+
GOMAXPROCS=1 go test -bench=. ./internal/...
45+
46+
# Match physical cores (no hyperthreading)
47+
GOMAXPROCS=4 go test -bench=. ./internal/...
48+
49+
# Default: uses all logical CPUs (GOMAXPROCS=runtime.NumCPU())
50+
go test -bench=. ./internal/...
51+
```
52+
53+
**When to use:**
54+
- `GOMAXPROCS=1`: Best for measuring raw single-threaded performance
55+
- `GOMAXPROCS=N`: For parallel benchmarks (`b.RunParallel`)
56+
- Default: For realistic multi-core scenarios
57+
58+
### Pinning to Single Core (Lowest Variance)
59+
60+
```bash
61+
# Run on CPU 0 only
62+
taskset -c 0 go test -bench=. ./internal/...
63+
64+
# Combined: single core + single GOMAXPROCS (ultimate isolation)
65+
taskset -c 0 GOMAXPROCS=1 go test -bench=. ./internal/...
66+
```
67+
68+
### Scheduler Priority (nice/renice)
69+
70+
Increase process priority to reduce interference from other processes:
71+
72+
```bash
73+
# Run with highest priority (requires root)
74+
sudo nice -n -20 go test -bench=. ./internal/...
75+
76+
# Or renice an existing process
77+
sudo renice -n -20 -p $(pgrep -f "go test")
78+
```
79+
80+
**Nice values:**
81+
- `-20`: Highest priority (most CPU time)
82+
- `0`: Default priority
83+
- `19`: Lowest priority (least CPU time)
84+
85+
**Combined with CPU pinning for maximum isolation:**
86+
87+
```bash
88+
sudo nice -n -20 taskset -c 0 GOMAXPROCS=1 go test -bench=. ./internal/...
89+
```
90+
91+
> **Note:** High priority alone doesn't prevent context switches. For true isolation, combine with CPU pinning and consider isolating CPU cores from the scheduler (`isolcpus` kernel parameter).
92+
93+
### macOS
94+
95+
```bash
96+
# Disable App Nap (can affect timing)
97+
defaults write NSGlobalDomain NSAppSleepDisabled -bool YES
98+
99+
# Run with elevated priority (macOS equivalent of nice)
100+
sudo nice -n -20 go test -bench=. ./internal/...
101+
```
102+
103+
### Advanced: Kernel-Level CPU Isolation
104+
105+
For the most stable benchmarks on dedicated machines:
106+
107+
```bash
108+
# 1. Add to kernel boot parameters (GRUB)
109+
# isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3
110+
111+
# 2. After reboot, CPUs 2-3 are isolated from scheduler
112+
# Run benchmarks on isolated CPU:
113+
sudo taskset -c 2 nice -n -20 GOMAXPROCS=1 go test -bench=. ./internal/...
114+
```
115+
116+
This removes the CPUs from general scheduling entirely.
117+
118+
## Running Benchmarks
119+
120+
### Standard Run
121+
122+
```bash
123+
go test -bench=. -benchmem ./internal/...
124+
```
125+
126+
### With Variance Analysis
127+
128+
Run 10 iterations and analyze with `benchstat`:
129+
130+
```bash
131+
# Install benchstat
132+
go install golang.org/x/perf/cmd/benchstat@latest
133+
134+
# Run benchmarks
135+
go test -bench=. -count=10 ./internal/... > results.txt
136+
137+
# Analyze
138+
benchstat results.txt
139+
```
140+
141+
### Comparing Before/After
142+
143+
```bash
144+
# Before changes
145+
go test -bench=. -count=10 ./internal/... > old.txt
146+
147+
# Make changes...
148+
149+
# After changes
150+
go test -bench=. -count=10 ./internal/... > new.txt
151+
152+
# Compare
153+
benchstat old.txt new.txt
154+
```
155+
156+
## Interpreting Results
157+
158+
### Understanding Output
159+
160+
```
161+
BenchmarkCancel_Atomic_Done_Direct-24 1000000000 0.34 ns/op 0 B/op 0 allocs/op
162+
```
163+
164+
- `-24`: Number of CPUs used (GOMAXPROCS)
165+
- `1000000000`: Iterations run
166+
- `0.34 ns/op`: Time per operation
167+
- `0 B/op`: Bytes allocated per operation
168+
- `0 allocs/op`: Heap allocations per operation
169+
170+
### Expected Variance
171+
172+
- **Good:** < 2% variance
173+
- **Acceptable:** 2-5% variance
174+
- **Investigate:** > 5% variance
175+
176+
High variance causes and mitigations:
177+
178+
| Cause | Mitigation |
179+
|-------|------------|
180+
| Background processes | `nice -n -20`, close browsers/IDEs |
181+
| CPU frequency scaling | Set governor to `performance` |
182+
| Thermal throttling | Let CPU cool between runs |
183+
| Memory pressure | Close memory-heavy apps |
184+
| Goroutine scheduling | `GOMAXPROCS=1` |
185+
| OS scheduler preemption | `taskset -c 0` + `nice -n -20` |
186+
| Hyperthreading noise | Pin to physical core |
187+
188+
### Sanity Checks
189+
190+
1. **Allocations should be 0** for hot-path operations
191+
2. **Relative ordering should be stable** across runs
192+
3. **TSC results may vary** with CPU frequency changes
193+
194+
## CLI Tools
195+
196+
### cmd/context
197+
198+
Compare context cancellation checking:
199+
200+
```bash
201+
go run ./cmd/context -n 10000000
202+
```
203+
204+
### cmd/channel
205+
206+
Compare queue implementations:
207+
208+
```bash
209+
go run ./cmd/channel -n 10000000 -size 1024
210+
```
211+
212+
### cmd/ticker
213+
214+
Compare ticker implementations:
215+
216+
```bash
217+
go run ./cmd/ticker -n 10000000
218+
```
219+
220+
### cmd/context-ticker
221+
222+
Combined benchmark (most realistic):
223+
224+
```bash
225+
go run ./cmd/context-ticker -n 10000000
226+
```
227+
228+
## Typical Results
229+
230+
Results on AMD Ryzen Threadripper PRO 3945WX:
231+
232+
| Component | Standard | Optimized | Speedup |
233+
|-----------|----------|-----------|---------|
234+
| Cancel check | ~10 ns | ~0.3 ns | **30x** |
235+
| Tick check | ~100 ns | ~6 ns (batch) | **16x** |
236+
| Combined | ~96 ns | ~5 ns | **18x** |
237+
238+
## Caveats
239+
240+
1. **Micro-benchmarks measure one dimension** — Real applications have many factors
241+
2. **Results are hardware-dependent** — Your mileage will vary
242+
3. **go:linkname may break**`runtime.nanotime` is internal
243+
4. **TSC requires calibration** — Accuracy depends on CPU frequency stability
244+
245+
## Profiling
246+
247+
### CPU Profile
248+
249+
```bash
250+
go test -bench=BenchmarkCancel -cpuprofile=cpu.prof ./internal/cancel
251+
go tool pprof -http=:8080 cpu.prof
252+
```
253+
254+
### Memory Profile
255+
256+
```bash
257+
go test -bench=BenchmarkQueue -memprofile=mem.prof ./internal/queue
258+
go tool pprof -http=:8080 mem.prof
259+
```
260+
261+
### Trace
262+
263+
```bash
264+
go test -bench=BenchmarkCombined -trace=trace.out ./internal/combined
265+
go tool trace trace.out
266+
```

0 commit comments

Comments
 (0)