A lightweight, high-performance metrics collection system designed for Orange Pi and other ARM devices. Collects comprehensive system metrics, stores them locally in SQLite, and uploads to VictoriaMetrics.
- Lightweight: Pure Go implementation, single binary deployment (<10 MB)
- Reliable: SQLite WAL mode with ARM SBC tuning, duplicate detection, automatic retry
- Comprehensive: 8+ system metrics (CPU, memory, disk, network, temperature)
- Observable: Built-in meta-metrics, health monitoring, clock skew detection
- Production-ready: Health endpoints, structured logging, security hardening
- Tested: Comprehensive unit and integration tests (100+ tests, all passing)
- Cross-platform: Runs on macOS (development) and Linux ARM64 (production)
- ✅ 2 metrics: CPU temperature + SRT packet loss (mock)
- ✅ SQLite storage with WAL mode
- ✅ HTTP POST to remote endpoint
- ✅ YAML configuration
- ✅ Cross-compilation for ARM64
- ✅ Systemd service with auto-restart
- ✅ Install script for deployment
- ✅ System Metrics: CPU usage, memory, disk I/O, network traffic
- ✅ VictoriaMetrics Integration: JSONL import, PromQL queries
- ✅ Upload Reliability: Chunked uploads, jittered backoff, duplicate prevention
- ✅ Health Monitoring: Graduated status (ok/degraded/error), HTTP endpoints
- ✅ Meta-Metrics: Observability of the collector itself
- ✅ Clock Skew Detection: Time synchronization monitoring
- ✅ Structured Logging: JSON/console formats with contextual fields
- ✅ Debian Packaging: Production-ready
.debpackages for amd64, arm64, and armhf - ✅ Systemd Watchdog: Automatic restart on hang/crash with 60s timeout
- ✅ Process Locking: Prevents double-start with file-based locking
- ✅ Automated CI/CD: GitHub Actions pipeline for building, testing, and releasing
- ✅ Security Hardening: Systemd directives, dedicated user, resource limits
- ✅ Database Migrations: Automatic schema migrations on upgrade
- ✅ Integration Tests: Docker-based tests for all architectures with VictoriaMetrics
- ✅ Installation Documentation: Quick-start, detailed guides, and troubleshooting
tidewatch/
├── cmd/
│ ├── tidewatch/ # Main collector binary
│ └── metrics-receiver/ # Simple HTTP receiver for testing
├── internal/
│ ├── models/ # Metric data structures
│ ├── config/ # YAML configuration
│ ├── collector/ # Metric collectors (system, mock SRT)
│ ├── storage/ # SQLite storage layer
│ └── uploader/ # HTTP uploader
├── configs/ # Sample configurations
├── scripts/ # Build and install scripts
├── systemd/ # Systemd service file
├── docs/ # Documentation
│ └── belabox-integration.md # Belabox integration notes
├── MILESTONE-1.md # Milestone 1 specification
└── PRD.md # Product requirements document
# Detect architecture
ARCH=$(dpkg --print-architecture)
# Download latest release (replace VERSION with actual version, e.g., 3.0.0-1)
VERSION="3.0.0-1"
wget https://github.com/taniwha3/tidewatch/releases/download/v${VERSION}/tidewatch_${VERSION}_${ARCH}.deb
# Install
sudo apt install ./tidewatch_${VERSION}_${ARCH}.deb
# Verify
sudo systemctl status tidewatch
tidewatch -versionThe service starts automatically and begins collecting metrics.
Edit /etc/tidewatch/config.yaml:
remote:
url: http://your-victoriametrics:8428/api/v1/import
enabled: trueRestart the service:
sudo systemctl restart tidewatchDocumentation: See docs/installation/ for detailed guides:
docker compose up -d victoriaVictoriaMetrics UI: http://localhost:8428/vmui
# Build
./scripts/build.sh
# Run with default config (sends to VictoriaMetrics)
./bin/tidewatch-darwin -config configs/config.yaml# Using VictoriaMetrics UI: http://localhost:8428/vmui
# Or using curl:
curl -s 'http://localhost:8428/api/v1/query?query=cpu_usage_percent' | jq .
# List all metrics:
curl -s 'http://localhost:8428/api/v1/label/__name__/values' | jq .curl http://localhost:9100/health | jq ../scripts/build.sh./bin/metrics-receiver-darwin -port 9090./bin/tidewatch-darwin -config configs/config.yamlsqlite3 /var/lib/tidewatch/metrics.db \
"SELECT metric_name, metric_value FROM metrics ORDER BY timestamp_ms DESC LIMIT 10"# Start service
sudo systemctl start tidewatch
# Stop service
sudo systemctl stop tidewatch
# Restart service
sudo systemctl restart tidewatch
# View status
sudo systemctl status tidewatch
# View logs
sudo journalctl -u tidewatch -f# Download new version
wget https://github.com/taniwha3/tidewatch/releases/download/vVERSION/tidewatch_VERSION_ARCH.deb
# Install (preserves config and data)
sudo apt install ./tidewatch_*.deb
# Verify upgrade
tidewatch -version
sudo systemctl status tidewatch# Remove package (keeps config and data)
sudo apt remove tidewatch
# Complete removal (deletes everything)
sudo apt purge tidewatchConfiguration file: /etc/tidewatch/config.yaml
device:
id: belabox-001 # Unique device identifier
storage:
path: /var/lib/tidewatch/metrics.db # SQLite database path
wal_checkpoint_interval: 1h # WAL checkpoint interval (must be positive)
wal_checkpoint_size_mb: 64 # WAL checkpoint size threshold
remote:
url: http://example.com/api/metrics # Remote endpoint URL
enabled: true # Enable remote uploads
upload_interval: 30s # Upload interval (must be positive)
retry:
enabled: true
max_attempts: 3 # Total attempts (initial + retries)
initial_backoff: 1s # Initial retry delay (must be positive)
max_backoff: 30s # Max retry delay (must be positive)
backoff_multiplier: 2.0
jitter_percent: 20
metrics:
- name: cpu.temperature
interval: 30s # Collection interval (must be positive)
enabled: true
- name: srt.packet_loss
interval: 5s
enabled: trueAll timing configuration values are strictly validated at startup:
- Duration values must be positive (e.g.,
30s,1m,1h) - Invalid values cause immediate failure with a clear error message
- No silent fallbacks - misconfigurations are caught early
Examples of invalid configurations that will cause startup failure:
storage:
wal_checkpoint_interval: -1h # ❌ Negative durations not allowed
remote:
upload_interval: 0s # ❌ Zero durations not allowed
retry:
initial_backoff: not-a-time # ❌ Invalid duration formatThis hard-fail behavior prevents:
- Negative retry backoffs that cause immediate retry hammering
- Zero intervals that cause
time.NewTickerpanics - Silent misconfigurations that go unnoticed in production
For complete configuration examples, see:
- configs/config.yaml - Production configuration
- configs/config.dev.yaml - Development configuration
- configs/config.prod.yaml - Production template
go test ./... -vgo test ./... -cover- Models: 4/4 tests passing
- Config: 7/7 tests passing
- Collectors: 8/8 tests passing
- Storage: 19/19 tests passing
- Uploader: 16/16 tests passing
- Total: 54/54 tests passing ✅
-
CPU Usage (
cpu_usage_percent,cpu_core_usage_percent)- Overall and per-core CPU usage with delta calculation
- Wraparound detection, first-sample skip
- Mock implementation on macOS
-
Memory (
memory_used_bytes,memory_available_bytes,memory_total_bytes)- Canonical used calculation: MemTotal - MemAvailable
- Swap metrics:
memory_swap_used_bytes,memory_swap_total_bytes - Mock implementation on macOS
-
Disk I/O (
disk_read_ops_total,disk_write_ops_total,disk_read_bytes_total,disk_write_bytes_total)- Reads/writes in ops/s and bytes/s
- Time metrics:
disk_read_time_ms_total,disk_write_time_ms_total - Per-device metrics with whole-device filtering (no partitions)
- Sector→byte conversion (512 bytes per sector per kernel docs)
-
Network (
network_rx_bytes_total,network_tx_bytes_total,network_rx_packets_total,network_tx_packets_total)- Per-interface traffic counters
- Error counters:
network_rx_errors_total,network_tx_errors_total - Wraparound detection
- Cardinality guard: max 32 interfaces (prevents explosion)
- Excludes: lo, docker*, veth*, br-*, wlan.mon, virbr., etc.
- Mock implementation on macOS
-
Temperature (
cpu_temperature_celsius)- Reads from
/sys/class/thermal/thermal_zone*/temp - Per-zone metrics with zone name tags
- Mock implementation on macOS
- Reads from
-
Collection Metrics
collector_metrics_collected_total: Total metrics collectedcollector_metrics_failed_total: Collection failurescollector_collection_duration_seconds: Collection time (p50, p95, p99)
-
Upload Metrics
uploader_metrics_uploaded_total: Total metrics uploadeduploader_upload_failures_total: Upload failuresuploader_upload_duration_seconds: Upload time (p50, p95, p99)
-
Storage Metrics
storage_database_size_bytes: SQLite DB sizestorage_wal_size_bytes: WAL file sizestorage_metrics_pending_upload: Pending upload count
-
Time Synchronization
time_skew_ms: Clock skew relative to server (positive = local ahead)- Separate URL check every 5 minutes
- Warns if skew > 2 seconds
- Real SRT stats from server-side SRTLA receiver
- Encoder metrics (from journald logs)
- HDMI input metrics (via v4l2-ctl)
- Load averages, system uptime
┌─────────────────────┐
│ Metric Collectors │ <- cpu.temperature, srt.packet_loss
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ SQLite Storage │ <- Local buffer (WAL mode)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ HTTP Uploader │ -> POST to remote endpoint
└─────────────────────┘
- Go: Fast, easy cross-compilation, single binary
- SQLite: Reliable embedded storage with WAL mode
- Collectors: Pluggable interface for different metric sources
- Storage-first: Always store locally, upload asynchronously
- No dependencies: Pure Go implementation (no CGO)
- Binary size: ~9.5 MB (darwin), ~9.3 MB (linux-arm64)
- Memory usage: <50 MB typical
- CPU usage: <5% on Orange Pi 5+ (RK3588)
- Storage: ~1 KB per metric (with indexes)
- Throughput: 1000+ metrics/second store rate
# Check config validity
./bin/tidewatch-darwin -config test-config.yaml -version
# Check logs
sudo journalctl -u tidewatch --no-pager -n 50# Check receiver is accessible
curl http://localhost:9090/health
# Check collector logs for upload errors
sudo journalctl -u tidewatch -f | grep upload# Check WAL mode is enabled
sqlite3 /var/lib/tidewatch/metrics.db "PRAGMA journal_mode"
# Should return: wal
# Checkpoint WAL if needed
sqlite3 /var/lib/tidewatch/metrics.db "PRAGMA wal_checkpoint(TRUNCATE)"- Go 1.24+
- SQLite (for testing)
- Implement the
Collectorinterface ininternal/collector/:
type Collector interface {
Name() string
Collect(ctx context.Context) ([]*models.Metric, error)
}- Register in
cmd/tidewatch/main.go:
case "your.metric":
coll = collector.NewYourCollector(cfg.Device.ID)- Add to config:
metrics:
- name: your.metric
interval: 30s
enabled: true- Quick Start Guide - Get started in 5 minutes
- Detailed Installation - Comprehensive install guide
- Troubleshooting - Common issues and solutions
- Build from Source - Local builds and contributing
- Health Monitoring - Health endpoints, status levels
- VictoriaMetrics Setup - Installation, querying, PromQL
- Milestone 1 - Initial release (complete)
- Milestone 2 - System metrics & reliability (complete)
- Milestone 3 - Debian packaging (complete)
- Product Roadmap - Full product vision
Milestone 4: Real SRT stats, encoder metrics, alerting Milestone 5: Priority queue, backfill, data retention
[Add license information]
[Add contributing guidelines]