Skip to content

Commit c4ed07a

Browse files
committed
support caching metrics
1 parent 285e17a commit c4ed07a

3 files changed

Lines changed: 224 additions & 0 deletions

File tree

README.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,89 @@ Notes: one file per key; atomic writes; optional compression and dedupe to skip
333333
334334
---
335335
336+
### Custom Storage
337+
338+
Implement your own storage backend by following the `CacheStorage` protocol:
339+
340+
```python
341+
from advanced_caching import CacheStorage, CacheEntry
342+
from typing import Any
343+
344+
class MyCustomStorage:
345+
"""Custom cache storage implementation."""
346+
347+
def get(self, key: str) -> Any | None:
348+
"""Retrieve value by key, or None if not found/expired."""
349+
...
350+
351+
def get_entry(self, key: str) -> CacheEntry | None:
352+
"""Retrieve full cache entry with metadata."""
353+
...
354+
355+
def set(self, key: str, value: Any, ttl: int | None = None) -> None:
356+
"""Store value with optional TTL in seconds."""
357+
...
358+
359+
def set_if_not_exists(self, key: str, value: Any, ttl: int | None = None) -> bool:
360+
"""Atomic set-if-not-exists. Returns True if set, False if key exists."""
361+
...
362+
363+
def delete(self, key: str) -> None:
364+
"""Remove key from storage."""
365+
...
366+
367+
def exists(self, key: str) -> bool:
368+
"""Check if key exists and is not expired."""
369+
...
370+
371+
# Validate implementation
372+
from advanced_caching import validate_cache_storage
373+
validate_cache_storage(MyCustomStorage())
374+
375+
# Use with decorators
376+
@TTLCache.cached("user:{id}", ttl=60, cache=MyCustomStorage())
377+
def get_user(id: int):
378+
return {"id": id}
379+
```
380+
381+
**Exposing Metrics:**
382+
383+
To track cache operations in your custom storage, wrap it with `InstrumentedStorage`:
384+
385+
```python
386+
from advanced_caching.storage import InstrumentedStorage
387+
from advanced_caching.metrics import InMemoryMetrics
388+
389+
# Create metrics collector
390+
metrics = InMemoryMetrics()
391+
392+
# Wrap your custom storage
393+
instrumented = InstrumentedStorage(
394+
storage=MyCustomStorage(),
395+
metrics=metrics,
396+
cache_name="my_custom_cache"
397+
)
398+
399+
# Use instrumented storage
400+
@TTLCache.cached("user:{id}", ttl=60, cache=instrumented)
401+
def get_user(id: int):
402+
return {"id": id}
403+
404+
# Query metrics
405+
stats = metrics.get_stats()
406+
# Includes: hits, misses, latency, errors, memory usage for "my_custom_cache"
407+
```
408+
409+
`InstrumentedStorage` automatically tracks:
410+
- All cache operations (get, set, delete)
411+
- Operation latency (p50/p95/p99 percentiles)
412+
- Errors with exception types
413+
- Memory usage (if your storage supports it)
414+
415+
See [Metrics Documentation](docs/metrics.md) for details.
416+
417+
---
418+
336419
## BGCache (Background)
337420
338421
Single-writer/multi-reader pattern with background refresh and optional independent reader caches.

docs/metrics.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,101 @@ stats = metrics.get_stats()
2727
# Returns: hits, misses, hit_rate, latency percentiles, errors, memory, background_refresh
2828
```
2929

30+
## Metrics Reference
31+
32+
All metrics collectors track the following operations and expose them through their respective backends.
33+
34+
| Metric Name | Type | What It Represents | When Recorded | Use Case | Labels/Dimensions |
35+
|-------------|------|-------------------|---------------|----------|-------------------|
36+
| **`cache.hits`** | Counter | Number of times data was successfully retrieved from cache without executing the underlying function | Every time a cache lookup finds valid (non-expired) data | Calculate cache effectiveness. High hit count indicates good cache utilization | `cache_name`, `operation` (always "get") |
37+
| **`cache.misses`** | Counter | Number of times data was not found in cache or was expired, requiring function execution | When cache lookup fails (key not found or TTL expired) | Identify cold cache scenarios or TTL tuning needs. High miss rate may indicate TTL is too short | `cache_name`, `operation` (always "get") |
38+
| **`cache.sets`** | Counter | Number of times data was written to cache after function execution | After the underlying function completes successfully and result is stored | Track cache write operations. Should roughly equal misses in normal operation | `cache_name`, `operation` (always "set") |
39+
| **`cache.deletes`** | Counter | Number of explicit cache entry removals (not TTL expirations) | When cache entries are manually deleted or evicted by cache policy | Monitor cache invalidation patterns. Debug cache coherency issues | `cache_name`, `operation` (always "delete") |
40+
| **`cache.hit_rate_percent`** | Gauge (Calculated) | Percentage of cache lookups that resulted in hits: `(hits / (hits + misses)) * 100` | Calculated on-demand (InMemoryMetrics) or periodically (exporters) | **Primary effectiveness metric.** Target: >80% for most apps, >95% for read-heavy workloads. Values: `95.5` = 95.5% from cache, `50.0` = half hit/miss, `0.0` = cold cache | `cache_name` |
41+
| **`cache.operation.duration`** | Histogram/Timer | Time spent in cache operations (get, set, delete) in milliseconds. Provides p50, p95, p99, avg aggregations | For every cache operation, wrapping the storage backend call | Detect storage backend performance issues. Compare local vs remote cache (Redis, S3, GCS). **Example:** `get_p50_ms: 0.12` = fast in-memory, `get_p99_ms: 45.0` = 1% take up to 45ms (network spike?) | `cache_name`, `operation` (get/set/delete) |
42+
| **`cache.errors`** | Counter | Number of errors encountered during cache operations | When cache operations raise exceptions (network failures, serialization errors, Redis connection issues) | Alert on storage backend failures. Identify problematic cache keys. Monitor Redis connection health. Breakdown by `error_type` (e.g., ConnectionError, TimeoutError) | `cache_name`, `operation`, `error_type` |
43+
| **`cache.background_refresh`** | Counter (success/failure breakdown) | Number of background refresh operations for SWRCache (stale refresh) and BGCache (scheduled refresh) | **SWRCache:** When serving stale data triggers background refresh<br>**BGCache:** On every scheduled loader execution | Monitor SWR effectiveness (serving stale while updating). Track BGCache job reliability. High failure rate indicates unreliable data source, network issues, or function errors | `cache_name`, `status` (success/failure) |
44+
| **`cache.memory.bytes`** | Gauge | Approximate memory usage of cached entries in bytes. Also provides `mb` (megabytes) and `entries` (item count) | Periodically or on-demand when using `InstrumentedStorage` wrapper | Prevent memory exhaustion in long-running processes. Size L1 cache appropriately in HybridCache. Trigger eviction at threshold | `cache_name` |
45+
| **`cache.entry.count`** | Gauge | Number of entries currently stored in cache | Tracked alongside memory metrics | Monitor cache growth over time. Validate cache eviction policies. Estimate memory per entry (bytes / entries) | `cache_name` |
46+
47+
---
48+
49+
## Metric Naming Conventions
50+
51+
### InMemoryMetrics
52+
Returns nested dictionary structure:
53+
```json
54+
{
55+
"uptime_seconds": 3600.5,
56+
"caches": {
57+
"get_user": {
58+
"hits": 100,
59+
"misses": 20,
60+
"sets": 20,
61+
"deletes": 5,
62+
"hit_rate_percent": 83.33
63+
},
64+
"get_product": {
65+
"hits": 50,
66+
"misses": 10,
67+
"sets": 10,
68+
"deletes": 2,
69+
"hit_rate_percent": 83.33
70+
}
71+
},
72+
"latency": {
73+
"get_user.get_p50_ms": 0.15,
74+
"get_user.get_p95_ms": 2.5,
75+
"get_user.get_p99_ms": 10.0,
76+
"get_user.get_avg_ms": 0.8,
77+
"get_product.get_p50_ms": 0.12,
78+
"get_product.set_p50_ms": 1.2
79+
},
80+
"errors": {
81+
"get_user.get": {
82+
"ConnectionError": 5,
83+
"TimeoutError": 2
84+
}
85+
},
86+
"memory": {
87+
"my_cache": {
88+
"bytes": 1048576,
89+
"mb": 1.0,
90+
"entries": 100
91+
},
92+
"another_cache": {
93+
"bytes": 524288,
94+
"mb": 0.5,
95+
"entries": 50
96+
}
97+
},
98+
"background_refresh": {
99+
"get_user": {
100+
"success": 50,
101+
"failure": 2
102+
}
103+
}
104+
}
105+
```
106+
107+
**Note:** Memory metrics are tracked **per-cache-name** when using `InstrumentedStorage` wrapper. If you have multiple functions sharing the same metrics collector but using different storage backends, each will have its own memory entry under the cache name you provide to `InstrumentedStorage(storage, metrics, "cache_name")`.
108+
109+
### OpenTelemetry
110+
Metric names follow OpenTelemetry conventions:
111+
- `cache.hits` (Counter with `cache_name` attribute)
112+
- `cache.misses` (Counter with `cache_name` attribute)
113+
- `cache.operation.duration` (Histogram with `cache_name`, `operation` attributes)
114+
115+
### GCP Cloud Monitoring
116+
Uses custom metric paths under your configured prefix:
117+
- `custom.googleapis.com/<prefix>/hits`
118+
- `custom.googleapis.com/<prefix>/misses`
119+
- `custom.googleapis.com/<prefix>/latency`
120+
121+
Labels: `cache_name`, `operation`
122+
123+
---
124+
30125
## InMemoryMetrics
31126

32127
Built-in collector for API endpoints. Zero external dependencies, thread-safe.

tests/test_metrics.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,52 @@ def test_instrumented_storage_memory_usage():
394394
assert metrics.memory_usages[0][2] == 2 # entry_count
395395

396396

397+
def test_memory_metrics_per_cache_name():
398+
"""Test that memory metrics are tracked separately per cache name."""
399+
from advanced_caching.metrics import InMemoryMetrics
400+
401+
metrics = InMemoryMetrics()
402+
403+
# Create two separate caches with different names
404+
cache1 = InMemCache()
405+
instrumented1 = InstrumentedStorage(cache1, metrics, "cache_one")
406+
407+
cache2 = InMemCache()
408+
instrumented2 = InstrumentedStorage(cache2, metrics, "cache_two")
409+
410+
# Add data to first cache
411+
instrumented1.set("key1", "x" * 1000, ttl=60)
412+
instrumented1.set("key2", "y" * 2000, ttl=60)
413+
414+
# Add data to second cache
415+
instrumented2.set("key1", "a" * 500, ttl=60)
416+
417+
# Get memory usage for each cache
418+
usage1 = instrumented1.get_memory_usage()
419+
usage2 = instrumented2.get_memory_usage()
420+
421+
# Get stats from shared metrics collector
422+
stats = metrics.get_stats()
423+
424+
# Verify memory is tracked per cache name
425+
assert "memory" in stats
426+
assert "cache_one" in stats["memory"]
427+
assert "cache_two" in stats["memory"]
428+
429+
# Verify each cache has its own memory stats
430+
assert stats["memory"]["cache_one"]["entries"] == 2
431+
assert stats["memory"]["cache_two"]["entries"] == 1
432+
433+
# Verify bytes are different for each cache
434+
assert stats["memory"]["cache_one"]["bytes"] > stats["memory"]["cache_two"]["bytes"]
435+
assert stats["memory"]["cache_one"]["mb"] > 0
436+
assert stats["memory"]["cache_two"]["mb"] > 0
437+
438+
print(f"\n✓ Memory metrics tracked separately:")
439+
print(f" - cache_one: {stats['memory']['cache_one']}")
440+
print(f" - cache_two: {stats['memory']['cache_two']}")
441+
442+
397443
def test_metrics_latency_overhead():
398444
"""Benchmark test to ensure metrics add minimal overhead."""
399445
import timeit

0 commit comments

Comments
 (0)