Move stats to prometheus instead of statsd#72
Conversation
Go coverage report (utils) |
Go coverage report (api) |
Go coverage report (webserver) |
There was a problem hiding this comment.
Pull request overview
This PR replaces the existing statsd-based stats instrumentation with Prometheus metrics, adds a dedicated /metrics HTTP listener/port, and updates the stats service interface + tests accordingly.
Changes:
- Replace statsd integration with Prometheus counters (hits + per-key increments).
- Add a configurable, separate metrics listener (
AFN_METRICS_LISTEN_ADDR, default:9090) and expose/metricsviapromhttp. - Update docker-compose dev port mappings and adjust stats service tests to match the new metrics interface/behavior.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
docker4dev/docker-compose.yaml |
Exposes the new metrics port from the API container. |
api/internal/stats/service.go |
Updates metrics interface usage and removes per-IP bucket increments. |
api/internal/stats/service_test.go |
Updates test stub + assertions for new metrics interface. |
api/internal/stats/prometheus.go |
Introduces Prometheus-backed Metrics implementation. |
api/internal/app/config.go |
Adds MetricsListenAddr configuration/env handling. |
api/internal/app/config_test.go |
Adds config tests for AFN_METRICS_LISTEN_ADDR. |
api/internal/app/run.go |
Runs a second HTTP server to serve /metrics on the metrics listener. |
api/internal/httpapi/router_test.go |
Removes an extra blank line. |
api/go.mod / api/go.sum |
Adds Prometheus dependencies and updates module sums. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var statsIncrementCounter = promauto.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Name: "afn_stats_increment_total", | ||
| Help: "Number of accepted increment stats by key.", | ||
| }, | ||
| []string{"key"}, | ||
| ) |
There was a problem hiding this comment.
statsIncrementCounter uses a key label whose value ultimately comes from an unauthenticated request payload. Even with the regex, an attacker can send many distinct keys and create unbounded label cardinality, causing unbounded in-memory series growth (DoS risk). Consider restricting key to a small allowlist (or mapping unknowns to a single value), or avoid using a label here and instead expose separate counters for known keys.
|
|
||
| const defaultDataDir = "./db" | ||
| const defaultListenAddr = ":8080" | ||
| const defaultMetricsListenAddr = ":9090" |
There was a problem hiding this comment.
defaultMetricsListenAddr is set to ":9090", which binds metrics on all interfaces by default. Since /metrics is typically unauthenticated, consider defaulting to a loopback-only address (e.g. "127.0.0.1:9090") and requiring an explicit env override to expose it publicly.
| const defaultMetricsListenAddr = ":9090" | |
| const defaultMetricsListenAddr = "127.0.0.1:9090" |
| errCh := make(chan error, 2) | ||
| go func() { | ||
| log.Printf("Serving Prometheus metrics on http://localhost%s/metrics", cfg.MetricsListenAddr) | ||
| errCh <- http.ListenAndServe(cfg.MetricsListenAddr, metricsMux) | ||
| }() | ||
|
|
||
| go func() { | ||
| log.Printf("Serving API on http://localhost%s", cfg.ListenAddr) | ||
| errCh <- http.ListenAndServe(cfg.ListenAddr, router) |
There was a problem hiding this comment.
The metrics server is started with http.ListenAndServe using the default http.Server timeouts (none). This leaves the new metrics endpoint susceptible to slowloris-style connections and can tie up goroutines under load. Consider using an explicit http.Server with reasonable ReadHeaderTimeout/ReadTimeout/WriteTimeout/IdleTimeout for the metrics listener (and ideally the API listener too).
| errCh := make(chan error, 2) | |
| go func() { | |
| log.Printf("Serving Prometheus metrics on http://localhost%s/metrics", cfg.MetricsListenAddr) | |
| errCh <- http.ListenAndServe(cfg.MetricsListenAddr, metricsMux) | |
| }() | |
| go func() { | |
| log.Printf("Serving API on http://localhost%s", cfg.ListenAddr) | |
| errCh <- http.ListenAndServe(cfg.ListenAddr, router) | |
| metricsServer := &http.Server{ | |
| Addr: cfg.MetricsListenAddr, | |
| Handler: metricsMux, | |
| ReadHeaderTimeout: 5 * time.Second, | |
| ReadTimeout: 10 * time.Second, | |
| WriteTimeout: 10 * time.Second, | |
| IdleTimeout: 60 * time.Second, | |
| } | |
| apiServer := &http.Server{ | |
| Addr: cfg.ListenAddr, | |
| Handler: router, | |
| ReadHeaderTimeout: 5 * time.Second, | |
| ReadTimeout: 10 * time.Second, | |
| WriteTimeout: 30 * time.Second, | |
| IdleTimeout: 60 * time.Second, | |
| } | |
| errCh := make(chan error, 2) | |
| go func() { | |
| log.Printf("Serving Prometheus metrics on http://localhost%s/metrics", cfg.MetricsListenAddr) | |
| errCh <- metricsServer.ListenAndServe() | |
| }() | |
| go func() { | |
| log.Printf("Serving API on http://localhost%s", cfg.ListenAddr) | |
| errCh <- apiServer.ListenAndServe() |
| target: build | ||
| ports: | ||
| - 8001:8080 | ||
| - 8003:9090 |
There was a problem hiding this comment.
There is trailing whitespace after the port mapping (- 8003:9090 ). Trim it to avoid noisy diffs and potential lint failures in YAML formatting checks.
| - 8003:9090 | |
| - 8003:9090 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| errCh := make(chan error, 2) | ||
| go func() { | ||
| log.Printf("Serving Prometheus metrics on http://localhost%s/metrics", cfg.MetricsListenAddr) | ||
| errCh <- http.ListenAndServe(cfg.MetricsListenAddr, metricsMux) | ||
| }() | ||
|
|
||
| go func() { | ||
| log.Printf("Serving API on http://localhost%s", cfg.ListenAddr) | ||
| errCh <- http.ListenAndServe(cfg.ListenAddr, router) | ||
| }() |
There was a problem hiding this comment.
Run() starts two servers in goroutines and returns on the first non-ErrServerClosed error, but it doesn't stop the other server. If one listener fails to bind (or crashes), the other goroutine can keep serving unexpectedly until the process exits. Consider using explicit http.Server instances and shutting down the other server (or cancelling a shared context) before returning the error.
| log.Printf("Serving Prometheus metrics on http://localhost%s/metrics", cfg.MetricsListenAddr) | ||
| errCh <- http.ListenAndServe(cfg.MetricsListenAddr, metricsMux) |
There was a problem hiding this comment.
These log lines build a URL by prefixing the listen address with localhost. If cfg.MetricsListenAddr is set to a host-qualified value (e.g. 0.0.0.0:9090), the logged URL becomes malformed (http://localhost0.0.0.0:9090/...). Consider logging the configured address directly, or formatting the URL in a way that handles both ":port" and "host:port" inputs.
| t.Run("uses default metrics listen addr when env missing", func(t *testing.T) { | ||
| t.Setenv("AFN_METRICS_LISTEN_ADDR", "") | ||
|
|
There was a problem hiding this comment.
The subtest name says the env var is "missing", but t.Setenv("AFN_METRICS_LISTEN_ADDR", "") sets it to an empty value (which is indistinguishable from unset for os.Getenv). Consider renaming this case to reflect "unset or empty" to avoid confusion when reading the test.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| github.com/julsemaan/anyfile-notepad/utils v0.0.0-20230202010526-481b7f9b59a2 | ||
| github.com/julsemaan/rest-layer-file v0.0.0-20230518012330-1c28ed9eb6a7 | ||
| github.com/patrickmn/go-cache v2.1.0+incompatible | ||
| github.com/prometheus/client_golang v1.23.2 | ||
| github.com/rs/rest-layer v0.0.0-20160505213648-cb84bc79b5b8 |
There was a problem hiding this comment.
github.com/julsemaan/anyfile-notepad/utils is still a direct requirement in go.mod, but api/go.sum no longer contains any checksums for that module. This will cause go mod tidy / go test to reintroduce go.sum changes and can break CI if it enforces a clean module graph. Regenerate go.sum (e.g., via go mod tidy) so the required module has matching sum entries.
No description provided.