Integrate dial9 Tokio telemetry across Psionic serving and transport runtimes

## Summary

Integrate `dial9-tokio-telemetry` anywhere it is materially useful in Psionic, starting with the Tokio-owned serving/runtime surfaces and then extending to transport runtimes where scheduler delay or async queueing can materially affect p99 latency.

The value here is not just “more metrics.” It is being able to answer, from a production trace, whether a bad tail-latency event came from:

- Tokio worker scheduling delay or worker imbalance
- request-handler CPU work on the async runtime
- async I/O stalls or proxy stalls
- waiting on Psionic's dedicated worker threads or external proxy processes

rather than from the inner model compute itself.

This aligns with the observation boundary in `docs/ARCHITECTURE.md`: Psionic owns serving, transport, execution substrate, and control/observation boundaries for those layers.

## Why this belongs in Psionic

`psionic-serve` is currently the strongest first target:

- `crates/psionic-serve/src/bin/psionic-openai-server.rs`
- `crates/psionic-serve/src/bin/psionic-gpt-oss-server.rs`
- `crates/psionic-serve/src/openai_http.rs`
- `crates/psionic-mlx-serve/src/bin/psionic-mlx-serve.rs`
- `crates/psionic-mlx-serve/src/lib.rs`

These are real Tokio/Axum serving entrypoints, while actual inference is often pushed onto dedicated `std::thread` workers and oneshot replies inside `crates/psionic-serve/src/openai_http.rs`, or through the llama.cpp proxy path in the same file.

That architecture makes dial9 a strong fit for separating runtime-side latency from backend compute latency.

`psionic-net` is the next-best fit because its transport loop is a long-lived Tokio task that can become sensitive to queueing, timer behavior, or socket burst handling:

- `crates/psionic-net/src/lib.rs`

## Important constraints

- dial9 requires Tokio unstable hooks via `--cfg tokio_unstable`.
- The highest-value features are Linux-only: scheduler delay and CPU sampling.
- To get wake-to-poll visibility for spawned tasks, use `TelemetryHandle::spawn` rather than plain `tokio::spawn`.
- For Axum/Hyper, we likely need a traced accept-loop / connection-executor pattern similar to `dial9-tokio-telemetry/examples/metrics-service/src/axum_traced.rs`.
- Many current Psionic entrypoints use `#[tokio::main]`; those need to become explicit runtime builders.
- dial9 will not directly profile the inner compute of `std::thread` workers or the llama.cpp subprocess. It will explain the async/runtime side around them.

## Proposed scope

### Phase 1: Runtime plumbing and safe gating

- Add optional dial9 dependency/feature(s) in runtime-owning crates, starting with `psionic-serve`, `psionic-mlx-serve`, and `psionic-net`.
- Add a consistent runtime config surface for:
  - trace path
  - enable/disable switch
  - task tracking
  - optional Linux cpu-profiling / sched-events
- Keep a hard no-op path for normal builds and unsupported environments.

### Phase 2: Instrument the highest-value server runtimes

- Replace `#[tokio::main]` in:
  - `crates/psionic-serve/src/bin/psionic-openai-server.rs`
  - `crates/psionic-serve/src/bin/psionic-gpt-oss-server.rs`
  - `crates/psionic-mlx-serve/src/bin/psionic-mlx-serve.rs`
  with explicit runtime builders wrapped by `TracedRuntime`.
- Instrument `axum::serve(...)` entrypoints in `crates/psionic-serve/src/openai_http.rs`.
- Route server-owned spawns through `TelemetryHandle::spawn` where wake tracking materially improves trace value.
- For Hyper/Axum connection tasks, adopt a traced executor / accept-loop wrapper rather than assuming plain `tokio::spawn` is enough.

### Phase 3: Instrument transport runtimes

- Add dial9 coverage to the long-lived transport loop in `crates/psionic-net/src/lib.rs`.
- Confirm we can capture:
  - poll durations
  - timer tick behavior
  - wake-to-poll delay during socket bursts
  - worker imbalance or scheduling delay on Linux

### Phase 4: Docs and operator workflow

- Document required flags and caveats:
  - `tokio_unstable`
  - frame pointers for Linux CPU profiling
  - `perf_event_paranoid` / `kptr_restrict`
- Add a minimal runbook for collecting traces locally and in staging/production.
- Explain what dial9 can and cannot tell us in Psionic given the `std::thread` worker architecture.

## Lower-priority targets

- `crates/psionic-apple-fm/src/client.rs`

This is useful, but it should come after the main serve and transport surfaces.

## Non-goals

- Do not instrument pure test-only `Runtime::new().block_on(...)` harnesses first.
- Do not treat dial9 as a replacement for backend-specific profiling of CUDA/Metal/MLX/GGUF execution.
- Do not move product/app concerns into Psionic just to support telemetry rollout.

## Acceptance criteria

- We can enable dial9 on the main Psionic server binaries without code changes to downstream callers.
- A disabled configuration is effectively a no-op.
- On Linux, traces let us distinguish scheduler delay vs handler CPU work vs backend wait time.
- On non-Linux, we still get useful runtime poll/park/wake visibility.
- At least one documented trace collection flow exists for:
  - `psionic-openai-server`
  - `psionic-gpt-oss-server`
  - `psionic-mlx-serve`
- Transport tracing exists for `psionic-net`, or the issue is explicitly split and deferred with rationale.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate dial9 Tokio telemetry across Psionic serving and transport runtimes #313

Summary

Why this belongs in Psionic

Important constraints

Proposed scope

Phase 1: Runtime plumbing and safe gating

Phase 2: Instrument the highest-value server runtimes

Phase 3: Instrument transport runtimes

Phase 4: Docs and operator workflow

Lower-priority targets

Non-goals

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate dial9 Tokio telemetry across Psionic serving and transport runtimes #313

Description

Summary

Why this belongs in Psionic

Important constraints

Proposed scope

Phase 1: Runtime plumbing and safe gating

Phase 2: Instrument the highest-value server runtimes

Phase 3: Instrument transport runtimes

Phase 4: Docs and operator workflow

Lower-priority targets

Non-goals

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions