Comprehensive guide to Prometheus queries for AI Memory Module monitoring. This documentation captures patterns and best practices learned during Epic 6 development, particularly addressing histogram aggregation mistakes that caused 6 HIGH issues in Story 6.3.
Related: See DEC-012 for port configuration (Prometheus: 29090, Grafana: 23000)
- Authenticated Queries
- Overview
- Histogram Queries (CRITICAL)
- Rate vs Increase
- Aggregation Patterns
- Label Cardinality
- Project-Specific Queries
- Token Metrics
- Multi-Embedding Metrics
- V2 Trigger Metrics
- Queue Metrics
- Dashboard Query Examples
Prometheus requires basic auth. Use the helper script:
# Instant query
python3 scripts/monitoring/prometheus_query.py "ai_memory_collection_size"
# Range query (last hour)
python3 scripts/monitoring/prometheus_query.py --range --start 1h "ai_memory_hook_duration_seconds_sum"
# Raw JSON output
python3 scripts/monitoring/prometheus_query.py --raw "up"
# Manual curl (if needed) - set PROMETHEUS_PASSWORD env var first
curl -u admin:$PROMETHEUS_PASSWORD "http://localhost:29090/api/v1/query?query=up"Environment variables for custom credentials:
PROMETHEUS_URL(default: http://localhost:29090)PROMETHEUS_USER(default: admin)PROMETHEUS_PASSWORD(from PROMETHEUS_ADMIN_PASSWORD in docker/.env)
Helper Script: scripts/monitoring/prometheus_query.py
AI Memory Module exposes metrics on port 28000 at /metrics endpoint. All metrics use the ai_memory_* naming convention (project convention: snake_case, ai_memory_ prefix).
Metric Types:
| Type | Metric Name | Description | Labels |
|---|---|---|---|
| Counter | ai_memory_memory_captures_total |
Memory capture attempts | hook_type, status, project |
| Counter | ai_memory_memory_retrievals_total |
Memory retrieval attempts | collection, status |
| Counter | ai_memory_embedding_requests_total |
Embedding generation requests | status, embedding_type |
| Counter | ai_memory_deduplication_events_total |
Deduplicated memories | project |
| Counter | ai_memory_failure_events_total |
Failure events for alerting | component, error_code |
| Counter | ai_memory_tokens_consumed_total |
Token consumption tracking | operation, direction, project |
| Counter | ai_memory_trigger_fires_total |
Trigger activations | trigger_type, status, project |
| Gauge | ai_memory_collection_size |
Points in collection | collection, project |
| Gauge | ai_memory_queue_size |
Pending retry queue items | status |
| Histogram | ai_memory_hook_duration_seconds |
Hook execution time | hook_type |
| Histogram | ai_memory_embedding_duration_seconds |
Embedding generation time | embedding_type |
| Histogram | ai_memory_retrieval_duration_seconds |
Memory retrieval time | None |
| Histogram | ai_memory_context_injection_tokens |
Context injection token counts | hook_type, collection, project |
| Histogram | ai_memory_trigger_results_returned |
Results returned per trigger | trigger_type |
| Info | ai_memory_memory_system_info |
Static system metadata | version, embedding_model, vector_dimensions, collections |
Performance NFRs:
- Hook overhead: <500ms (NFR-P1)
- Embedding generation: <2s (NFR-P2)
- SessionStart retrieval: <3s (NFR-P3)
Most Common Mistake: Missing aggregation in histogram_quantile() queries.
Histograms in Prometheus are cumulative counters stored in _bucket metrics with an le (less than or equal) label. When querying percentiles across multiple instances or time series, you must aggregate by le before applying histogram_quantile().
# This will fail or return incorrect results
histogram_quantile(0.95, rate(ai_memory_hook_duration_seconds_bucket[5m]))
Why it's wrong:
rate()returns per-instance bucket countershistogram_quantile()requires aggregated buckets across all time series- Without aggregation, percentiles are calculated incorrectly or the query fails
# Always use sum by (le) with histogram_quantile
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
Why it's correct:
rate()calculates per-second rate over 5 minutessum by (le)aggregates all bucket counters while preserving thelelabelhistogram_quantile()calculates the 95th percentile from aggregated distribution
If you need to preserve other labels (like hook_type), include them in the aggregation:
# Preserve hook_type to see p95 per hook type
histogram_quantile(0.95, sum by (le, hook_type) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# p50 (median) - typical latency
histogram_quantile(0.50, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# p95 - catches most outliers
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# p99 - extreme outliers
histogram_quantile(0.99, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Panel with p50, p95, p99 (from memory-performance.json)
# Query A - p50
histogram_quantile(0.50, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Query B - p95
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Query C - p99
histogram_quantile(0.99, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
Use rate() when you want the per-second rate of increase over a time window.
# Captures per second over last 5 minutes
rate(ai_memory_memory_captures_total[5m])
# Typical use: dashboards showing ops/sec
sum(rate(ai_memory_memory_captures_total[1h]))
Characteristics:
- Returns per-second average
- Automatically handles counter resets
- Best for: rates, throughput, ops/sec metrics
- Unit: events/second
Use increase() when you want the total increase over a time window.
# Total captures in last 1 hour
increase(ai_memory_memory_captures_total[1h])
# Total failures in last 24 hours
sum(increase(ai_memory_failure_events_total[24h]))
Characteristics:
- Returns total count increase
- Automatically handles counter resets
- Best for: totals, counts over period
- Unit: total events
# ❌ WRONG - Using increase() for per-second rate
sum(increase(ai_memory_memory_captures_total[5m])) # Returns total, not rate
# ✅ CORRECT - Using rate() for per-second rate
sum(rate(ai_memory_memory_captures_total[5m])) # Returns ops/sec
# ❌ WRONG - Using rate() when you want totals
sum(rate(ai_memory_memory_captures_total[1h])) # Returns ops/sec, not total
# ✅ CORRECT - Using increase() for totals
sum(increase(ai_memory_memory_captures_total[1h])) # Returns total count
- Dashboard gauges/graphs showing rate: Use
rate() - Alerting on total events: Use
increase() - Calculating percentages: Use
rate()for both numerator and denominator
Aggregation operators determine which labels are preserved or removed in results.
Keep only the specified labels, aggregate everything else:
# Group by project only
sum by (project) (rate(ai_memory_memory_captures_total[5m]))
# Group by collection and status
sum by (collection, status) (rate(ai_memory_memory_retrievals_total[5m]))
# Group by component and error_code
sum by (component, error_code) (rate(ai_memory_failure_events_total[5m]))
Use when: You want to see breakdowns by specific dimensions.
Remove specified labels, keep everything else:
# Remove only the instance label
sum without (instance) (ai_memory_collection_size)
# Remove multiple labels
sum without (instance, job) (rate(ai_memory_memory_captures_total[5m]))
Use when: You want to aggregate across some labels but preserve most.
Critical Rule: histogram_quantile() requires buckets aggregated by le.
# ❌ WRONG - Missing le aggregation
histogram_quantile(0.95, rate(ai_memory_hook_duration_seconds_bucket[5m]))
# ✅ CORRECT - sum by (le)
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# ✅ CORRECT - sum by (le, hook_type) to preserve hook_type dimension
histogram_quantile(0.95, sum by (le, hook_type) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# max - highest value across series
max(ai_memory_queue_size{status="pending"})
# min - lowest value
min(ai_memory_collection_size)
# avg - average value
avg by (project) (ai_memory_collection_size)
# count - number of time series
count(ai_memory_collection_size)
Golden Rule: Keep label cardinality low to maintain Prometheus performance.
These are safe to use as labels:
| Label | Cardinality | Examples |
|---|---|---|
hook_type |
~5 | PostToolUse, SessionStart, PreToolUse, PreCompact, Stop |
status |
~3 | success, failed, queued |
collection |
~3 | code-patterns, conventions, discussions |
component |
~4 | qdrant, embedding, queue, hook |
error_code |
~5 | QDRANT_UNAVAILABLE, EMBEDDING_TIMEOUT, etc. |
project |
<100 | Project names (acceptable if bounded) |
Never use these as labels:
| Anti-Pattern | Why It's Bad |
|---|---|
user_id |
Unbounded - grows with users |
session_id |
Unbounded - new session every execution |
memory_id |
Unbounded - new UUID per memory |
timestamp |
Unbounded - infinite unique values |
file_path |
High cardinality - thousands of files |
error_message |
High cardinality - unique messages |
# ❌ BAD - Creates thousands of time series
my_metric{user_id="user123", session_id="sess_456", memory_id="uuid-789"}
# ✅ GOOD - Bounded labels only
ai_memory_memory_captures_total{hook_type="PostToolUse", status="success", project="my-project"}
Problems with high cardinality:
- Prometheus memory exhaustion
- Slow query performance
- Expensive storage costs
- Query timeouts
The project label is used for multi-tenancy:
# Safe - project count is bounded to active projects
ai_memory_collection_size{collection="code-patterns", project="my-project"}
Why it works:
- Limited number of active projects per installation (<100)
- Projects are long-lived (not per-request)
- Enables per-project monitoring and alerting
# Capture rate by hook type
sum by (hook_type) (rate(ai_memory_memory_captures_total[5m]))
# Retrieval rate by collection
sum by (collection) (rate(ai_memory_memory_retrievals_total[5m]))
# Embedding request rate
sum(rate(ai_memory_embedding_requests_total[5m]))
# Deduplication rate by project
sum by (project) (rate(ai_memory_deduplication_events_total[5m]))
# Total failure rate
sum(rate(ai_memory_failure_events_total[5m]))
# Failure rate by component
sum by (component) (rate(ai_memory_failure_events_total[5m]))
# Failure rate by error code
sum by (error_code) (rate(ai_memory_failure_events_total[5m]))
# Failure rate by component and error code
sum by (component, error_code) (rate(ai_memory_failure_events_total[5m]))
# Overall capture success rate (percentage)
sum(rate(ai_memory_memory_captures_total{status="success"}[1h])) / sum(rate(ai_memory_memory_captures_total[1h])) * 100
# Success rate by hook type
sum by (hook_type) (rate(ai_memory_memory_captures_total{status="success"}[1h])) / sum by (hook_type) (rate(ai_memory_memory_captures_total[1h])) * 100
# Retrieval success rate by collection
sum by (collection) (rate(ai_memory_memory_retrievals_total{status="success"}[5m])) / sum by (collection) (rate(ai_memory_memory_retrievals_total[5m])) * 100
# Hook p50, p95, p99 (as shown in dashboards)
histogram_quantile(0.50, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
histogram_quantile(0.99, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Embedding p95 (NFR-P2: <2s target)
histogram_quantile(0.95, sum by (le) (rate(ai_memory_embedding_duration_seconds_bucket[5m])))
# Retrieval p95 (NFR-P3: <3s target)
histogram_quantile(0.95, sum by (le) (rate(ai_memory_retrieval_duration_seconds_bucket[5m])))
# Current collection sizes
ai_memory_collection_size
# Collection size by project
ai_memory_collection_size{project="my-project"}
# Total points across all collections
sum(ai_memory_collection_size)
# Queue size (pending items)
ai_memory_queue_size{status="pending"}
# Queue size (exhausted items)
ai_memory_queue_size{status="exhausted"}
# Hook duration exceeds 500ms (NFR-P1)
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m]))) > 0.5
# Embedding duration exceeds 2s (NFR-P2)
histogram_quantile(0.95, sum by (le) (rate(ai_memory_embedding_duration_seconds_bucket[5m]))) > 2.0
# Retrieval duration exceeds 3s (NFR-P3)
histogram_quantile(0.95, sum by (le) (rate(ai_memory_retrieval_duration_seconds_bucket[5m]))) > 3.0
# High failure rate (>1 failure/min)
sum(rate(ai_memory_failure_events_total[5m])) > 1/60
# Collection approaching threshold (>8000 points)
max(ai_memory_collection_size) > 8000
# Queue backlog growing (>10 pending)
ai_memory_queue_size{status="pending"} > 10
# Trigger fires by type (last hour)
sum by (trigger_type) (
increase(ai_memory_trigger_fires_total[1h])
)
# Trigger success rate
sum(ai_memory_trigger_fires_total{status="success"}) /
sum(ai_memory_trigger_fires_total) * 100
# Average results per trigger type
histogram_quantile(0.5,
sum by (trigger_type, le) (
rate(ai_memory_trigger_results_returned_bucket[5m])
)
)
# Total tokens by operation (last 24h)
sum by (operation) (
increase(ai_memory_tokens_consumed_total[24h])
)
# Input vs output tokens
sum by (direction) (
increase(ai_memory_tokens_consumed_total[1h])
)
# Tokens per project
sum by (project) (
increase(ai_memory_tokens_consumed_total[24h])
)
# Collection sizes by project
ai_memory_collection_size{project!="all"}
# Total memories across all collections
sum(ai_memory_collection_size{project="all"})
# Deduplication rate (last hour)
sum(increase(ai_memory_deduplication_events_total[1h])) /
sum(increase(ai_memory_memory_captures_total[1h])) * 100
# Failures by component
sum by (component) (
increase(ai_memory_failure_events_total[1h])
)
# Alert: High failure rate
increase(ai_memory_failure_events_total[5m]) > 5
TECH-DEBT-067: V2.0 token usage tracking for context injection and memory operations.
# Total tokens consumed in last 1 hour
sum(increase(ai_memory_tokens_consumed_total[1h]))
# By direction (input vs output)
sum by (direction) (rate(ai_memory_tokens_consumed_total[5m]))
# By operation type
sum by (operation) (rate(ai_memory_tokens_consumed_total[5m]))
# By project
sum by (project) (rate(ai_memory_tokens_consumed_total[5m]))
Labels:
operation:capture,retrieval,trigger,injection,classification(TECH-DEBT-071)direction:input(TO system),output(FROM system),stored(persisted to memory) (TECH-DEBT-071)project: Project name (from group_id) or"classifier"for system-level operations
Direction Semantics (TECH-DEBT-071):
input- Data flowing INTO an operation (e.g., prompt to classifier, query to search)output- Data flowing OUT of an operation (e.g., LLM response, search results)stored- Data persisted to Qdrant collections (e.g., captured code, user messages)
# Median tokens injected per hook
histogram_quantile(0.50, sum by (le, hook_type) (rate(ai_memory_context_injection_tokens_bucket[5m])))
# p95 tokens injected
histogram_quantile(0.95, sum by (le, hook_type) (rate(ai_memory_context_injection_tokens_bucket[5m])))
# By collection
histogram_quantile(0.95, sum by (le, collection) (rate(ai_memory_context_injection_tokens_bucket[5m])))
# By project (BUG-046 fix)
histogram_quantile(0.95, sum by (le, project) (rate(ai_memory_context_injection_tokens_bucket[5m])))
# Specific project filtering
histogram_quantile(0.95, sum by (le, hook_type) (rate(ai_memory_context_injection_tokens_bucket{project="my-project"}[5m])))
Use Cases:
- Monitor context window usage (target: 70-80% of available tokens)
- Track token consumption patterns by operation type
- Alert on excessive token usage
- Per-project token injection monitoring (BUG-046)
TECH-DEBT-067: V2.0 multi-embedding support for dense, BM25, and SPLADE embeddings.
NOTE: Panels show "No data" for disabled embedding types. This is expected behavior.
# Overall request rate by type
sum by (embedding_type) (rate(ai_memory_embedding_requests_total[5m]))
# Success rate by embedding type
sum by (embedding_type) (rate(ai_memory_embedding_requests_total{status="success"}[5m]))
# Failure rate by type
sum by (embedding_type) (rate(ai_memory_embedding_requests_total{status="failed"}[5m]))
Expected Behavior:
- If only dense embeddings enabled: Only
embedding_type="dense"shows data - BM25/SPLADE series show "No data" until respective services are enabled
# p95 latency by embedding type (CRITICAL: use sum by (le, embedding_type))
histogram_quantile(0.95,
sum by (le, embedding_type) (rate(ai_memory_embedding_duration_seconds_bucket[5m]))
)
# p50 latency by type
histogram_quantile(0.50,
sum by (le, embedding_type) (rate(ai_memory_embedding_duration_seconds_bucket[5m]))
)
# p99 latency by type
histogram_quantile(0.99,
sum by (le, embedding_type) (rate(ai_memory_embedding_duration_seconds_bucket[5m]))
)
NFR Targets:
- Dense: <2s (NFR-P2)
- BM25: <500ms (expected)
- SPLADE: <1s (expected)
# Success rate percentage by type
sum by (embedding_type) (rate(ai_memory_embedding_requests_total{status="success"}[5m]))
/ sum by (embedding_type) (rate(ai_memory_embedding_requests_total[5m])) * 100
Panel Config: Unit: percent, Range: 0-100, Thresholds: 90→red, 95→yellow, 98→green
# Total requests by type (for pie chart)
sum by (embedding_type) (increase(ai_memory_embedding_requests_total[1h]))
Use Cases:
- Monitor which embedding types are in use
- Compare latency across embedding services
- Alert on embedding service failures
TECH-DEBT-067: V2.0 automatic trigger system for decision keywords, best practices, and session history.
# Overall trigger fire rate
sum(rate(ai_memory_trigger_fires_total[5m]))
# By trigger type
sum by (trigger_type) (rate(ai_memory_trigger_fires_total[5m]))
# By status (success, empty, failed)
sum by (status) (rate(ai_memory_trigger_fires_total[5m]))
# By project
sum by (project) (rate(ai_memory_trigger_fires_total[5m]))
Trigger Types:
decision_keywords: "why did we...", "what was decided"best_practices_keywords: "best practice", "how should I"session_history_keywords: "what have we done", "project status"error_detection: Automatic on Bash errorsnew_file: Automatic on Write toolfirst_edit: Automatic on first Edit to file
# Overall success rate (percentage)
sum(rate(ai_memory_trigger_fires_total{status="success"}[5m]))
/ sum(rate(ai_memory_trigger_fires_total[5m])) * 100
# Success rate by trigger type
sum by (trigger_type) (rate(ai_memory_trigger_fires_total{status="success"}[5m]))
/ sum by (trigger_type) (rate(ai_memory_trigger_fires_total[5m])) * 100
Statuses:
success: Trigger fired and returned resultsempty: Trigger fired but no relevant memories foundfailed: Trigger failed (Qdrant unavailable, etc.)
# Median results returned per trigger type
histogram_quantile(0.50, sum by (le, trigger_type) (rate(ai_memory_trigger_results_returned_bucket[5m])))
# p95 results returned
histogram_quantile(0.95, sum by (le, trigger_type) (rate(ai_memory_trigger_results_returned_bucket[5m])))
Expected Values:
- Decision triggers: Typically 1-2 results
- Best practices triggers: Typically 2-3 results
- Session history triggers: Typically 1-3 results
- 0 results indicates
status="empty"
# Trigger fire rate by project
sum by (project, trigger_type) (rate(ai_memory_trigger_fires_total[5m]))
# Most active trigger types per project
topk(3, sum by (project, trigger_type) (rate(ai_memory_trigger_fires_total[5m])))
Use Cases:
- Monitor which triggers are most active
- Identify projects with high trigger activity
- Alert on trigger failures
- Validate trigger configuration changes
QUEUE-UNIFY: Unified retry queue for all memory storage failures with exponential backoff.
# Current queue size by status
ai_memory_queue_size
# Ready for retry (can be processed now)
ai_memory_queue_size{status="ready"}
# Pending (awaiting backoff timer)
ai_memory_queue_size{status="pending"}
# Exhausted (exceeded max retries)
ai_memory_queue_size{status="exhausted"}
Statuses:
ready: Items wherenext_retry_at <= now, can be processed immediatelypending: Items awaiting backoff timer (next_retry_at > now)exhausted: Items that exceededmax_retries(default: 3)
# Total items in queue (should be 0 in healthy system)
sum(ai_memory_queue_size)
# Alert threshold: Any exhausted items is a problem
ai_memory_queue_size{status="exhausted"} > 0
# Queue size over time (for trend analysis)
# Note: This is a gauge, not a counter - no rate() needed
ai_memory_queue_size
# Max queue size in last hour
max_over_time(ai_memory_queue_size[1h])
Backoff Schedule:
- Retry 1: 1 minute
- Retry 2: 5 minutes
- Retry 3+: 15 minutes (capped)
Use Cases:
- Monitor queue health (all values should be 0)
- Alert on exhausted items (indicates persistent failures)
- Track retry patterns over time
- Validate Qdrant/embedding service availability
Dashboard: Memory Overview → "Retry Queue" row
Complete working queries extracted from our Grafana dashboards (docker/grafana/dashboards/*.json).
# Shows overall captures/sec across all hooks
sum(rate(ai_memory_memory_captures_total[1h]))
Panel Config: Unit: ops, Decimals: 2, Thresholds: 0→green, 10→yellow, 50→red
# Shows overall retrievals/sec across all collections
sum(rate(ai_memory_memory_retrievals_total[1h]))
Panel Config: Unit: ops, Decimals: 2, Thresholds: 0→green, 5→yellow, 20→red
# Shows current size of each collection-project combination
ai_memory_collection_size
Legend: {{collection}} - {{project}}
Panel Config: Unit: short, Thresholds: 0→green, 8000→yellow, 10000→red, Max: 12000
# Shows pending items in retry queue
ai_memory_queue_size{status="pending"}
Panel Config: Unit: short, Thresholds: 0→green, 10→yellow, 50→red
# Query A - Captures by project
sum by (project) (rate(ai_memory_memory_captures_total[5m]))
# Query B - Retrievals by collection
sum by (collection) (rate(ai_memory_memory_retrievals_total[5m]))
Legend A: Captures - {{project}}
Legend B: Retrievals - {{collection}}
Panel Config: Unit: ops, Smooth interpolation, 10% fill opacity
# Shows failure rate by component and error code
sum by (component, error_code) (rate(ai_memory_failure_events_total[5m]))
Legend: {{component}} - {{error_code}}
Panel Config: Unit: ops, Smooth interpolation, 20% fill opacity
# Query A - p50 (median)
histogram_quantile(0.50, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Query B - p95
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Query C - p99
histogram_quantile(0.99, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
Legend A: p50 - {{hook_type}}
Legend B: p95 - {{hook_type}}
Legend C: p99 - {{hook_type}}
Panel Config: Unit: s, Thresholds: 0.5s→yellow, 1.0s→red, Line threshold display
# Shows distribution of embedding durations as heatmap
rate(ai_memory_embedding_duration_seconds_bucket[5m])
Legend: {{le}}
Format: heatmap
Panel Config: Exponential scale, Spectral color scheme, Y-axis unit: s
# Shows 95th percentile retrieval time
histogram_quantile(0.95, sum by (le) (rate(ai_memory_retrieval_duration_seconds_bucket[5m])))
Panel Config: Unit: s, Decimals: 3, Thresholds: 0→green, 2s→yellow, 3s→red
# Calculate success percentage per hook type
sum(rate(ai_memory_memory_captures_total{status="success"}[1h])) by (hook_type) / sum(rate(ai_memory_memory_captures_total[1h])) by (hook_type) * 100
Legend: {{hook_type}}
Panel Config: Unit: percent, Decimals: 1, Range: 0-100, Thresholds: 0→red, 90→yellow, 95→green
# Current gauge values
ai_memory_collection_size
ai_memory_queue_size{status="pending"}
# Latest histogram bucket values (rarely used directly)
ai_memory_hook_duration_seconds_bucket
# Rate over time
rate(ai_memory_memory_captures_total[5m])
# Increase over time
increase(ai_memory_memory_captures_total[1h])
# Histogram percentiles over time
histogram_quantile(0.95, sum by (le) (rate(ai_memory_hook_duration_seconds_bucket[5m])))
# Sum across all series
sum(rate(ai_memory_memory_captures_total[5m]))
# Sum preserving labels
sum by (project) (rate(ai_memory_memory_captures_total[5m]))
# Average
avg(ai_memory_collection_size)
# Max/Min
max(ai_memory_queue_size)
min(ai_memory_hook_duration_seconds_bucket)
# Ratio/Percentage
(metric_a / metric_b) * 100
# Difference
metric_a - metric_b
# Rate of change
rate(metric[5m])
- ✅ Always use
sum by (le)withhistogram_quantile() - ✅ Use
rate()for per-second rates,increase()for totals - ✅ Keep label cardinality low (<100 unique values per label)
- ✅ Choose appropriate time ranges:
[5m]for real-time,[1h]for trends - ✅ Preserve necessary labels with
sum by (label1, label2) - ✅ Use consistent time ranges in numerator and denominator for ratios
- ✅ Test queries in Prometheus UI (
http://localhost:29090) before adding to dashboards - ✅ Include units in panel configs (
s,ops,percent,short) - ✅ Set meaningful thresholds based on NFRs
- ✅ Use descriptive legend formats with label templating:
{{label}}
- Metrics Definitions:
src/memory/metrics.py - Monitoring API:
monitoring/main.py - Dashboards:
docker/grafana/dashboards/*.json - Port Configuration: DEC-012 (Prometheus: 29090, Grafana: 23000)
- Performance NFRs: NFR-P1 (<500ms hooks), NFR-P2 (<2s embedding), NFR-P3 (<3s retrieval)
Document created per ACT-001 - Captures Prometheus query patterns from Epic 6 development