-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Integrate dial9-tokio-telemetry anywhere it is materially useful in Psionic, starting with the Tokio-owned serving/runtime surfaces and then extending to transport runtimes where scheduler delay or async queueing can materially affect p99 latency.
The value here is not just “more metrics.” It is being able to answer, from a production trace, whether a bad tail-latency event came from:
- Tokio worker scheduling delay or worker imbalance
- request-handler CPU work on the async runtime
- async I/O stalls or proxy stalls
- waiting on Psionic's dedicated worker threads or external proxy processes
rather than from the inner model compute itself.
This aligns with the observation boundary in docs/ARCHITECTURE.md: Psionic owns serving, transport, execution substrate, and control/observation boundaries for those layers.
Why this belongs in Psionic
psionic-serve is currently the strongest first target:
crates/psionic-serve/src/bin/psionic-openai-server.rscrates/psionic-serve/src/bin/psionic-gpt-oss-server.rscrates/psionic-serve/src/openai_http.rscrates/psionic-mlx-serve/src/bin/psionic-mlx-serve.rscrates/psionic-mlx-serve/src/lib.rs
These are real Tokio/Axum serving entrypoints, while actual inference is often pushed onto dedicated std::thread workers and oneshot replies inside crates/psionic-serve/src/openai_http.rs, or through the llama.cpp proxy path in the same file.
That architecture makes dial9 a strong fit for separating runtime-side latency from backend compute latency.
psionic-net is the next-best fit because its transport loop is a long-lived Tokio task that can become sensitive to queueing, timer behavior, or socket burst handling:
crates/psionic-net/src/lib.rs
Important constraints
- dial9 requires Tokio unstable hooks via
--cfg tokio_unstable. - The highest-value features are Linux-only: scheduler delay and CPU sampling.
- To get wake-to-poll visibility for spawned tasks, use
TelemetryHandle::spawnrather than plaintokio::spawn. - For Axum/Hyper, we likely need a traced accept-loop / connection-executor pattern similar to
dial9-tokio-telemetry/examples/metrics-service/src/axum_traced.rs. - Many current Psionic entrypoints use
#[tokio::main]; those need to become explicit runtime builders. - dial9 will not directly profile the inner compute of
std::threadworkers or the llama.cpp subprocess. It will explain the async/runtime side around them.
Proposed scope
Phase 1: Runtime plumbing and safe gating
- Add optional dial9 dependency/feature(s) in runtime-owning crates, starting with
psionic-serve,psionic-mlx-serve, andpsionic-net. - Add a consistent runtime config surface for:
- trace path
- enable/disable switch
- task tracking
- optional Linux cpu-profiling / sched-events
- Keep a hard no-op path for normal builds and unsupported environments.
Phase 2: Instrument the highest-value server runtimes
- Replace
#[tokio::main]in:crates/psionic-serve/src/bin/psionic-openai-server.rscrates/psionic-serve/src/bin/psionic-gpt-oss-server.rscrates/psionic-mlx-serve/src/bin/psionic-mlx-serve.rs
with explicit runtime builders wrapped byTracedRuntime.
- Instrument
axum::serve(...)entrypoints incrates/psionic-serve/src/openai_http.rs. - Route server-owned spawns through
TelemetryHandle::spawnwhere wake tracking materially improves trace value. - For Hyper/Axum connection tasks, adopt a traced executor / accept-loop wrapper rather than assuming plain
tokio::spawnis enough.
Phase 3: Instrument transport runtimes
- Add dial9 coverage to the long-lived transport loop in
crates/psionic-net/src/lib.rs. - Confirm we can capture:
- poll durations
- timer tick behavior
- wake-to-poll delay during socket bursts
- worker imbalance or scheduling delay on Linux
Phase 4: Docs and operator workflow
- Document required flags and caveats:
tokio_unstable- frame pointers for Linux CPU profiling
perf_event_paranoid/kptr_restrict
- Add a minimal runbook for collecting traces locally and in staging/production.
- Explain what dial9 can and cannot tell us in Psionic given the
std::threadworker architecture.
Lower-priority targets
crates/psionic-apple-fm/src/client.rs
This is useful, but it should come after the main serve and transport surfaces.
Non-goals
- Do not instrument pure test-only
Runtime::new().block_on(...)harnesses first. - Do not treat dial9 as a replacement for backend-specific profiling of CUDA/Metal/MLX/GGUF execution.
- Do not move product/app concerns into Psionic just to support telemetry rollout.
Acceptance criteria
- We can enable dial9 on the main Psionic server binaries without code changes to downstream callers.
- A disabled configuration is effectively a no-op.
- On Linux, traces let us distinguish scheduler delay vs handler CPU work vs backend wait time.
- On non-Linux, we still get useful runtime poll/park/wake visibility.
- At least one documented trace collection flow exists for:
psionic-openai-serverpsionic-gpt-oss-serverpsionic-mlx-serve
- Transport tracing exists for
psionic-net, or the issue is explicitly split and deferred with rationale.