feat(analytics): log request duration and client IP#153
Conversation
Adds two fields to the Analytics Engine request event: - double3: duration_ms — wall-clock time spent dispatching the request through the gateway, also added to the response tracing log - blob8: client_ip — taken from the CF-Connecting-IP header; user_id (blob5) remains empty for anonymous requests Refactors RequestEvent to expose index()/blobs()/doubles() so the schema ordering is pure and unit-testable on native targets, with the worker-specific writer cfg-gated to wasm32. Adds tests/analytics.rs covering path segment extraction, schema ordering, and blob truncation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
🚀 Latest commit deployed to https://source-data-proxy-pr-153.source-coop.workers.dev
|
|
Claude finished @alukach's task in 2m 59s —— View job ✅ No blocking issues — safe to merge.
Everything that matters — HMAC-SHA256 keying, empty-IP short-circuit, UTF-8-boundary truncation, |
tylere
left a comment
There was a problem hiding this comment.
As discussed off-thread, suggest moving to storing a hashed client IP (w/ fixed secret salt) rather than the raw client IP.
|
TODO: Consider also logging range header |
Replace the raw `client_ip` blob with a salted SHA-256 (`client_ip_hash`) so raw IPs never land in the Analytics Engine dataset, and add a `range` blob capturing the `Range` request header when present. - `hash_ip(ip, salt)`: pure, hex-encoded SHA-256 of `salt || ip`; empty IP → empty out so anonymous clients don't collapse to one hash. - Salt comes from the optional `IP_HASH_SALT` secret; unset degrades to unsalted (warns) rather than failing the deploy. - blob8 is now `client_ip_hash`, blob9 is `range` (append-only; blob1–7 / double1–3 unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch `hash_ip` from a bare `SHA256(salt ‖ ip)` secret-prefix construction to HMAC-SHA256 keyed by the salt. HMAC is the conventional keyed-hash primitive and is robust regardless of how the output is later reused, removing the need to reason about concatenation ambiguity / length extension. `hmac` was already a transitive dep. Behavior is unchanged (deterministic, salt-keyed, empty IP → empty out); only the digest values change. Per-IP hashes are not comparable across this change, which is fine since the salt is new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on-client-ip # Conflicts: # src/analytics.rs # src/config.rs # src/lib.rs
PR #153 reads IP_HASH_SALT to salt client-IP HMAC hashes, but no workflow uploaded it, so deploys would ship unsalted (brute-forceable) hashes. Pass the per-environment GitHub secret through the deploy callers and include it in `wrangler secret bulk` when present — optional like OIDC_PROVIDER_KEY_PREVIOUS, since the worker tolerates a missing salt (warns, does not panic). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
blob9 stored the Range header verbatim (e.g. "bytes=0-1023"); the unit prefix is constant for byte ranges and just pads every range row. Strip it to "0-1023" so the value is directly parseable in queries. A non-bytes unit (legal per RFC 7233, never seen in practice) and the empty string pass through unchanged via strip_range_unit, which is unit-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🤖 I have created a release *beep* *boop* --- ## [2.2.0](v2.1.2...v2.2.0) (2026-07-01) ### Features * accept multiple audiences for /.sts token exchange ([#163](#163)) ([e911496](e911496)) * **analytics:** log request duration and client IP ([#153](#153)) ([cad41b1](cad41b1)) * authorize and enable writes to data connections ([#162](#162)) ([85972e8](85972e8)) * make STS max session TTL configurable via env var ([#165](#165)) ([39d15f5](39d15f5)) * OIDC provider ([#132](#132)) ([5671b64](5671b64)) * per-connection backend authentication via OIDC federation ([#147](#147)) ([2f7a12f](2f7a12f)) * **worker:** aggregate live-globe activity by datacenter ([#171](#171)) ([c0a3169](c0a3169)) ### Bug Fixes * **deps:** bump quinn-proto to 0.11.15 (RUSTSEC-2026-0185) ([#161](#161)) ([189e348](189e348)) * **registry:** sync product model with source.coop[#284](https://github.com/source-cooperative/data.source.coop/issues/284) (drop mirror config, use visibility) ([#149](#149)) ([8ecf9b4](8ecf9b4)) * return clear 400 for keyless writes instead of misleading sha256 error ([#168](#168)) ([f1187f5](f1187f5)) * **sigv4:** use encoded request path for inbound signature verification ([#176](#176)) ([56a9520](56a9520)) * **sts:** bound the AssumeRoleWithWebIdentity call with a request timeout ([#172](#172)) ([fa463c7](fa463c7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: source-release-bot[bot] <265100246+source-release-bot[bot]@users.noreply.github.com>
Summary
Adds three fields to the Analytics Engine request event:
double3: duration_ms— wall-clock time spent dispatching the request through the gateway (measured withDate.now()around the gateway dispatch, so it reflects time awaiting the origin). Also added to theresponsetracing log line.blob8: client_ip_hash— an HMAC-SHA256 of theCF-Connecting-IPheader (keyed by the salt), hex-encoded. The raw IP never enters the dataset; the keyed hash still lets us count distinct clients without storing PII. The salt/key comes from the optionalIP_HASH_SALTsecret — unset degrades to an unkeyed/empty-key hash (warns; brute-forceable over the small IPv4 space) rather than failing the deploy. Empty IP → empty hash, so anonymous clients don't all collapse to one value.user_id(blob5) is unchanged and remains empty for anonymous requests.blob9: range— theRangerequest header verbatim (e.g.bytes=0-1023), empty when absent.All fields are appended after the existing columns, so existing queries against blob1–7 / double1–2 are unaffected.
Refactor for testability
RequestEventnow exposesindex()/blobs()/doubles()so the schema ordering lives in pure functions;log_requestiterates over them. IP hashing lives in a purehash_ip(ip, salt)so it's testable natively. Theworker-dependent writer iscfg-gated towasm32, which letstests/analytics.rscompile the module natively via the same#[path]pattern used by the existing routing/pagination tests.Tests
tests/analytics.rs(12 tests) covering:extract_path_segments(empty, account-only, account/product, nested key)client_ip_hashandrange)file_pathtruncation to 256 bytes, including UTF-8 char-boundary backoffhash_ip: determinism + hex output, salt changes the output, empty IP stays emptyConfig
New optional secret
IP_HASH_SALTdocumented inREADME.mdandwrangler.toml.🤖 Generated with Claude Code