You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**`cache.hits`**| Counter | Number of times data was successfully retrieved from cache without executing the underlying function | Every time a cache lookup finds valid (non-expired) data | Calculate cache effectiveness. High hit count indicates good cache utilization |`cache_name`, `operation` (always "get") |
37
+
|**`cache.misses`**| Counter | Number of times data was not found in cache or was expired, requiring function execution | When cache lookup fails (key not found or TTL expired) | Identify cold cache scenarios or TTL tuning needs. High miss rate may indicate TTL is too short |`cache_name`, `operation` (always "get") |
38
+
|**`cache.sets`**| Counter | Number of times data was written to cache after function execution | After the underlying function completes successfully and result is stored | Track cache write operations. Should roughly equal misses in normal operation |`cache_name`, `operation` (always "set") |
39
+
|**`cache.deletes`**| Counter | Number of explicit cache entry removals (not TTL expirations) | When cache entries are manually deleted or evicted by cache policy | Monitor cache invalidation patterns. Debug cache coherency issues |`cache_name`, `operation` (always "delete") |
40
+
|**`cache.hit_rate_percent`**| Gauge (Calculated) | Percentage of cache lookups that resulted in hits: `(hits / (hits + misses)) * 100`| Calculated on-demand (InMemoryMetrics) or periodically (exporters) |**Primary effectiveness metric.** Target: >80% for most apps, >95% for read-heavy workloads. Values: `95.5` = 95.5% from cache, `50.0` = half hit/miss, `0.0` = cold cache |`cache_name`|
41
+
|**`cache.operation.duration`**| Histogram/Timer | Time spent in cache operations (get, set, delete) in milliseconds. Provides p50, p95, p99, avg aggregations | For every cache operation, wrapping the storage backend call | Detect storage backend performance issues. Compare local vs remote cache (Redis, S3, GCS). **Example:**`get_p50_ms: 0.12` = fast in-memory, `get_p99_ms: 45.0` = 1% take up to 45ms (network spike?) |`cache_name`, `operation` (get/set/delete) |
42
+
|**`cache.errors`**| Counter | Number of errors encountered during cache operations | When cache operations raise exceptions (network failures, serialization errors, Redis connection issues) | Alert on storage backend failures. Identify problematic cache keys. Monitor Redis connection health. Breakdown by `error_type` (e.g., ConnectionError, TimeoutError) |`cache_name`, `operation`, `error_type`|
43
+
|**`cache.background_refresh`**| Counter (success/failure breakdown) | Number of background refresh operations for SWRCache (stale refresh) and BGCache (scheduled refresh) |**SWRCache:** When serving stale data triggers background refresh<br>**BGCache:** On every scheduled loader execution | Monitor SWR effectiveness (serving stale while updating). Track BGCache job reliability. High failure rate indicates unreliable data source, network issues, or function errors |`cache_name`, `status` (success/failure) |
44
+
|**`cache.memory.bytes`**| Gauge | Approximate memory usage of cached entries in bytes. Also provides `mb` (megabytes) and `entries` (item count) | Periodically or on-demand when using `InstrumentedStorage` wrapper | Prevent memory exhaustion in long-running processes. Size L1 cache appropriately in HybridCache. Trigger eviction at threshold |`cache_name`|
45
+
|**`cache.entry.count`**| Gauge | Number of entries currently stored in cache | Tracked alongside memory metrics | Monitor cache growth over time. Validate cache eviction policies. Estimate memory per entry (bytes / entries) |`cache_name`|
46
+
47
+
---
48
+
49
+
## Metric Naming Conventions
50
+
51
+
### InMemoryMetrics
52
+
Returns nested dictionary structure:
53
+
```json
54
+
{
55
+
"uptime_seconds": 3600.5,
56
+
"caches": {
57
+
"get_user": {
58
+
"hits": 100,
59
+
"misses": 20,
60
+
"sets": 20,
61
+
"deletes": 5,
62
+
"hit_rate_percent": 83.33
63
+
},
64
+
"get_product": {
65
+
"hits": 50,
66
+
"misses": 10,
67
+
"sets": 10,
68
+
"deletes": 2,
69
+
"hit_rate_percent": 83.33
70
+
}
71
+
},
72
+
"latency": {
73
+
"get_user.get_p50_ms": 0.15,
74
+
"get_user.get_p95_ms": 2.5,
75
+
"get_user.get_p99_ms": 10.0,
76
+
"get_user.get_avg_ms": 0.8,
77
+
"get_product.get_p50_ms": 0.12,
78
+
"get_product.set_p50_ms": 1.2
79
+
},
80
+
"errors": {
81
+
"get_user.get": {
82
+
"ConnectionError": 5,
83
+
"TimeoutError": 2
84
+
}
85
+
},
86
+
"memory": {
87
+
"my_cache": {
88
+
"bytes": 1048576,
89
+
"mb": 1.0,
90
+
"entries": 100
91
+
},
92
+
"another_cache": {
93
+
"bytes": 524288,
94
+
"mb": 0.5,
95
+
"entries": 50
96
+
}
97
+
},
98
+
"background_refresh": {
99
+
"get_user": {
100
+
"success": 50,
101
+
"failure": 2
102
+
}
103
+
}
104
+
}
105
+
```
106
+
107
+
**Note:** Memory metrics are tracked **per-cache-name** when using `InstrumentedStorage` wrapper. If you have multiple functions sharing the same metrics collector but using different storage backends, each will have its own memory entry under the cache name you provide to `InstrumentedStorage(storage, metrics, "cache_name")`.
108
+
109
+
### OpenTelemetry
110
+
Metric names follow OpenTelemetry conventions:
111
+
-`cache.hits` (Counter with `cache_name` attribute)
112
+
-`cache.misses` (Counter with `cache_name` attribute)
113
+
-`cache.operation.duration` (Histogram with `cache_name`, `operation` attributes)
114
+
115
+
### GCP Cloud Monitoring
116
+
Uses custom metric paths under your configured prefix:
117
+
-`custom.googleapis.com/<prefix>/hits`
118
+
-`custom.googleapis.com/<prefix>/misses`
119
+
-`custom.googleapis.com/<prefix>/latency`
120
+
121
+
Labels: `cache_name`, `operation`
122
+
123
+
---
124
+
30
125
## InMemoryMetrics
31
126
32
127
Built-in collector for API endpoints. Zero external dependencies, thread-safe.
0 commit comments