Skip to content

mesh: wire frontend network catalog into daemon, call start() at boot#205

Merged
mrjeeves merged 3 commits into
mainfrom
claude/ecstatic-goodall-RMOay
May 28, 2026
Merged

mesh: wire frontend network catalog into daemon, call start() at boot#205
mrjeeves merged 3 commits into
mainfrom
claude/ecstatic-goodall-RMOay

Conversation

@mrjeeves
Copy link
Copy Markdown
Owner

Symptom

After PRs #203 + #204 merged, every install was stuck at "Joining <handle>…" with no peers visible — neither side could see the other. User-confirmed reproduction:

2026-05-28T02:42:38.336065Z  INFO daemon starting version="0.1.2" networks=0
2026-05-28T02:42:38.336598Z  INFO mesh opened device_id=...
2026-05-28T02:42:38.336720Z  INFO identity ready device_id=...
2026-05-28T02:42:38.336919Z  INFO control socket listening target=Name("myownmesh.sock")
daemon: own-LLM daemon up

Daemon healthy, control socket bound, identity loaded — but networks=0 and the UI never advances past phase=starting.

Root cause

Two gaps the Phase B PRs left behind (the unticked C-1 / C-6 checkboxes in #203):

  1. meshClient.start() was never called. App.svelte:230 calls meshClient.reconcile() at boot. In the legacy mesh-client.svelte.ts (Trystero) this kicked off the room join. The new mesh-daemon.svelte.ts::reconcile() just re-snapshots peers — it never subscribes to mesh://event, never installs RPC handlers, never advances phase past off. grep -rn "meshClient.start" src/ returned only the definition.
  2. The frontend's saved-network catalog was never pushed into the daemon. config.ts::addNetwork / setActiveNetwork / removeNetwork mutated ~/.myownllm/config.json only; no mesh_daemon_network_add call existed anywhere. Even after fixing (1), start() would dead-end at joined_networks[0] ?? "" because the daemon's own config under MYOWNMESH_HOME=~/.myownllm starts empty after the migration.

Fix

src/ui/App.svelte — boot calls meshClient.start() instead of meshClient.reconcile(). start() owns event subscription, peer snapshot, RPC handler install, capability publish — the full bring-up.

src/mesh-daemon.svelte.ts

  • start() bootstraps the daemon's joined-network set from the frontend's active network via syncActiveNetworkToDaemon. Single-active-network UX, so any other daemon-joined networks get dropped to keep state aligned with what the LLM displays.
  • start() serialised through an inflightStart promise so a boot call + an early settings click can't double-bootstrap and leak the first event listener.
  • start() gains a bounded retry on mesh_daemon_status so the brief race window between Tauri's setup() returning and the async daemon-spawn task calling app.manage() doesn't surface as a hard error.
  • reconcile() now detects an active-network switch (Switch button, addNetwork({activate:true})) and does a full stop() → start() so handler claims rebind under the new this.network and the daemon leaves/joins as needed. Same-network refresh path retained.

src/config.ts — new bridge helpers:

  • networkConfigToDaemonShape(net) translates the frontend's flat schema (signaling_servers: string[], etc.) into the daemon's structured myownmesh_core::config::NetworkConfig shape (SignalingConfig, StunServer { urls }, TurnServer { urls }).
  • daemonAddNetwork / daemonRemoveNetwork — idempotent wrappers (treat "already in use" + "unknown network" as success).
  • syncActiveNetworkToDaemon(cfg) — convergence: drop every joined network ≠ active, add active if missing. Reentrant; second launch is a no-op once the daemon has persisted the network under MYOWNMESH_HOME.

Known gap (deliberate, follow-up)

Mid-session settings edits to the active network's STUN / TURN / signaling lists aren't auto-propagated — the daemon has no network-update RPC, only add + remove. Documented in reconcile()'s doc; user toggles the network off+on to apply changes. Follow-up can detect config drift and remove-then-re-add.

Test plan

  • pnpm run check — 164 files, 0 errors, 0 warnings.
  • pnpm run build — clean.
  • Two devices: each launches, sees the other's pending-approval knock, approves, peers go active. (Needs hardware to validate.)
  • Switch active network in Settings: UI shows "Joining <new-handle>…" then online; daemon log shows leave-old + join-new.
  • Add a new network with activate: UI converges on the new network without a restart.
  • Remove active network: UI shows "No active network — pick one below"; daemon log shows leave.
  • Cold launch after the network was added once: daemon's own config.json has it; bootstrap is a no-op; UI lands on online directly.

Generated by Claude Code

claude added 3 commits May 28, 2026 03:10
PRs #203 and #204 landed the daemon plumbing and the sidecar bundle,
but left two gaps that left every install stuck pre-join after the
migration off Trystero:

1. `meshClient.start()` was never invoked. App.svelte called
   `meshClient.reconcile()` at boot — which now (in the daemon
   client) just refreshes peers without subscribing to
   `mesh://event` or advancing past phase=off. The status pill
   stayed at "Joining <handle>…" forever.
2. The frontend's saved-network catalog was never pushed into the
   daemon. The daemon started with `networks=0` regardless of what
   the user had configured in `~/.myownllm/config.json`, so even
   when start() ran it dead-ended at `joined_networks[0] ?? ""`.

Fix:

- App.svelte boot: call `meshClient.start()` instead of
  `meshClient.reconcile()`. Start owns event subscription, peer
  snapshot, RPC handler install, capability publish — all the
  things reconcile() doesn't do.
- mesh-daemon.svelte.ts `start()`: bootstrap the daemon's
  joined-network set from the frontend's active network. Single-
  active-network UX, so drop any daemon networks that aren't the
  current active. Idempotent — second launch sees the network
  already joined (daemon persisted it under MYOWNMESH_HOME) and
  is a no-op.
- mesh-daemon.svelte.ts `reconcile()`: when active network
  changes under us (Switch button, addNetwork-with-activate),
  stop + start so handler claims rebind under the new network
  and the daemon-side leave/join converges.
- config.ts: helpers (`networkConfigToDaemonShape`,
  `daemonAddNetwork`, `daemonRemoveNetwork`,
  `syncActiveNetworkToDaemon`) translate between the frontend's
  flat schema (`signaling_servers: string[]`, etc.) and the
  daemon's structured `myownmesh_core::config::NetworkConfig`
  shape (`SignalingConfig`, `StunServer { urls }`, etc.).
- start() also gains a short retry on `mesh_daemon_status` so a
  boot that races the daemon-spawn task in Rust's setup() doesn't
  surface as a hard error during the brief window before the
  state is `app.manage()`d.
- start() serialised via an `inflightStart` promise so a boot
  call + an early settings click can't double-bootstrap and leak
  the first event listener.

Mid-session settings edits to the active network's STUN / TURN /
signaling lists aren't auto-propagated — the daemon has no
network-update RPC, only add/remove. Documented in `reconcile()`'s
doc; toggle the network off+on to apply changes.

`pnpm run check` clean (164 files, 0 errors).
`pnpm run build` clean.
PR #204's sidecar bundling prefers a sibling MyOwnMesh checkout's
`target/<profile>/myownmesh` binary over the GitHub release
download, on the assumption that "if the user has a sibling
checkout, they want it." That assumption skipped a check the user
hit in the wild: the sibling target/ is whatever the user last
built, NOT necessarily the rev pinned in `.myownmesh-rev`.

Concrete failure: one device's sibling at v0.1.1 + pin at v0.1.2
→ build.rs copied the v0.1.1 binary; the daemon's startup log
reported `version="0.1.1"`. The user's other device had no
sibling target build → fell through to release download, got
v0.1.2. The two daemons couldn't peer because the wire-protocol
additions in v0.1.2's PR #16 (the RPC + typed-channel + capability
ops) aren't understood by v0.1.1.

Fix: when the sibling exists, run `<binary> --version` and
compare against the pin. On match, use the sibling. On mismatch
or unreadable version, loud warning + fall through to the
release download. The escape hatch for users hacking against a
non-pinned MyOwnMesh version (env var `MYOWNLLM_MESH_BIN` →
explicit binary path, handled in step 1) is unchanged and
bypasses the version check entirely.

Also write `.bundled-rev` when the sibling path succeeds so the
next build's idempotency short-circuit can find it.

Standalone `rustc --edition=2021 src-tauri/build.rs ...` clean
(the only diagnostic is the expected unresolved `tauri_build`
crate from the build-dep that lives outside this sandbox).
@mrjeeves mrjeeves merged commit 26dd70a into main May 28, 2026
4 checks passed
@mrjeeves mrjeeves deleted the claude/ecstatic-goodall-RMOay branch May 28, 2026 05:31
mrjeeves added a commit that referenced this pull request May 28, 2026
After the Phase B–D daemon migration (PR #203/#205) the LLM was
joined to the mesh but the network-feature surface — remote
inference, hardware advertisement, settings sync, late-joiner
catch-up — wasn't actually working. Six gaps that nominally landed
as "Phase C-6 / D" in #203 but in practice were left as TODOs.

**1. Capabilities stripped by the daemon shoulder.**

The daemon's `CapabilityAdvert` is `{tags, app_version,
max_connections, extra}`. The LLM was pushing the structured
`Capabilities` blob (`{llms, asr, diarize, hardware, inputs,
outputs, accepting, app_version, features}`) directly, which
serde silently dropped on deserialize — peers always saw each
other as "no LLMs / no ASR / no hardware", which broke every
piece of LLM-side capability-keyed routing (remote inference
peer picker, transcribe peer picker, the LLM/ASR chips in
Connections).

Fix: pack the full `Capabilities` into `CapabilityAdvert.extra`
before pushing; unpack in `daemonPeerToEntry` via a new
`peerCapabilitiesFromAdvert` helper that validates each field
and falls back to empty defaults. `CapabilityAdvert.app_version`
takes precedence over the inner copy since the daemon promotes
that field in `hello` for cosmetic display.

**2. Local inference handler 404'd every remote call.**

`localCapabilitiesForHandler()` hard-returned `llms: []` (marked
as "Phase C-6 wires this for real"), so even when a peer routed
inference to us we hit `streamRpcEnd("no local LLM available")`
and never reached Ollama.

Fix: cache the last-pushed `Capabilities` in
`lastLocalCapabilities` (populated by `pushCapabilities`); the
handler now sees the live LLM list and can pick a model by
(family, mode) exactly the way the legacy mesh-client did.

**3. Local mutations never broadcast.**

`agentPermissions.setBroadcaster(...)` and
`agentPrompts.setBroadcaster(...)` are the hooks both stores fire
on every local edit (`persistPatch` / `persistList`). The legacy
client wired them; the new client never did, so editing a tool
permission or saving a prompt was silent on the wire.

Fix: install both broadcasters in `startImpl()` and release them
via the `featureReleases` array on `stop()`. Both are gated on
`autoGossipEnabled` inside the callback so the network's
isolation contract (auto-gossip off → no outbound) holds.

**4. Permissions wire shape was wrong.**

`publishPermissions` was shipping the daemon's *roster list*
(`{authorized: [{device_id, label}], ts}`) on the
`permissions/snapshot` channel — meaningless for the actual
feature, which is per-tool agent gates (shell, write_file).
Even if the merge had been wired, the incoming data would have
been useless.

Fix: ship `{tools: {shell, write_file}, ts}` matching the shape
`agentPermissions.mergeIncoming` consumes. New
`publishPermissionsSnapshot(client, snap)` helper lets the
`setBroadcaster` callback ship a pre-formed snapshot without
re-reading config from disk on every mutation.

Prompts had the same problem at lower stakes — `publishPrompts`
was lossy-mapping each prompt to `{id, label, body}`, dropping
`tools`, `user_prompt`, and `updated_at`. Now ships the full
`Prompt` shape so `agentPrompts.mergeIncoming` can do per-id
LWW correctly.

**5. Inbound snapshots were logged, not merged.**

The `subscribePermissions` / `subscribePrompts` hooks fired
`appendDiag("info", "permissions snapshot from ...: N entries")`
and stopped. The actual merge into `agentPermissions` /
`agentPrompts` (which is what makes a peer's edit visible
locally) was never called.

Fix: hooks now call `agentPermissions.mergeIncoming(snap.tools,
activeNetworkId)` / `agentPrompts.mergeIncoming(snap.prompts,
activeNetworkId)` and log only when the merge actually changed
something. Gated on `autoGossipEnabled` (isolation contract:
when gossip is off, peer pressure can't mutate our policy).

New `activeConfigNetworkId` field tracks the LLM-side config id
(distinct from `this.network` which is the wire-level
`network_id`) so the merge scopes correctly — a snapshot
arriving on network A doesn't accidentally overwrite network
B's saved policy.

**6. Auto-gossip toggle reset to false every launch.**

`setAutoGossip` updated an in-memory `autoGossipEnabled = false`
field; the UI binds to `active?.auto_gossip` from config (so the
toggle visually reverted on every `reloadFromConfig`); the toggle
was never persisted. The hydration on `start()` was missing too —
even users who'd previously enabled gossip saw it off after
restart.

Fix: hydrate `autoGossipEnabled` from `activeNetwork(cfg)
?.auto_gossip ?? true` on start (matches the legacy default).
`setAutoGossip` persists via `updateNetwork(active.id, {
auto_gossip })`. Toggle now sticks across restarts.

**7. No periodic refresh + no late-joiner replay.**

The daemon's typed channels don't replay past publishes — a peer
who handshakes 30s after our initial publish sees an empty
`peer.catalog`, no prompts, no permissions until our next local
mutation. The legacy client ran a 60s catalog refresh tick + a
once-per-newly-active-peer catch-up broadcast; both were missing.

Fix: 60s `setInterval` re-publishing catalog (+ gossip-gated
perms/prompts). A `shipCatchUpGossipToNewlyActive()` hook fires
from `reconcile()` whenever the peer snapshot changes — newly
active peers get a one-shot catch-up broadcast, tracked in a
`gossipedOnceTo` set that prunes stale entries (so a flap
active → shelved → active gets the catch-up again).

Initial peers (active at start time) get seeded into
`gossipedOnceTo` so the initial broadcast on `start()` isn't
duplicated by the first `reconcile()`.

**8. `noteCatalogChanged` fired one publish per mutation.**

App-side bulk operations (folder move-N-files, multi-rename)
each call `refreshConversations()` which calls
`noteCatalogChanged()`. Without debounce, a 20-file move = 20
catalog broadcasts.

Fix: 500ms `setTimeout` coalesce in `noteCatalogChanged` — same
shape as the legacy client. Single broadcast at the trailing
edge of the burst.

---

Files:

- `src/mesh-daemon.svelte.ts`: +368 / -37. New helper
  (`peerCapabilitiesFromAdvert`), pack/unpack wiring on
  `pushCapabilities`/`daemonPeerToEntry`, `lastLocalCapabilities`
  cache feeding `localCapabilitiesForHandler`, broadcaster
  wiring + release, inbound merge hooks, `activeConfigNetworkId`
  field, autoGossipEnabled hydration + persistence, periodic
  refresh interval, catch-up gossip path, catalog debounce.

- `src/mesh-gossip.ts`: +68 / -36. Fixed permissions wire shape
  (`{tools}` not roster), full Prompt[] in prompts wire,
  `publishPermissionsSnapshot` / `publishPromptsSnapshot`
  variants for `setBroadcaster` callers, dropped the obsolete
  roster-list flow.

**Validation:**
- `pnpm run check`: 164 files, 0 errors, 0 warnings.
- `pnpm run build`: clean.
- Rust unchanged — Tauri build env (gdk-3.0) isn't installed in
  the sandbox so `cargo check` can't run; no `.rs` files touched.

https://claude.ai/code/session_01RLu1LdTgtxEDdzhybzqFrk

Co-authored-by: Claude <noreply@anthropic.com>
mrjeeves added a commit that referenced this pull request May 28, 2026
…on (#207)

The migration off Trystero onto the standalone myownmesh daemon
(PRs #201 / #203 / #204 / #205 / #206) shipped the code but left
every doc still describing the world before the move:

- README claimed mesh discovery went "via Trystero over public
  Nostr relays" and that agent permissions persisted under
  `Config.agent_permissions.by_device[<device_id>]`.
- ARCHITECTURE.md's mesh-module section described `mesh-client.svelte.ts`
  (deleted), Trystero room ownership (gone), and a TS module table
  that didn't list any of the files Phase C–D actually shipped
  (`mesh-daemon.svelte.ts`, `mesh-gossip.ts`, `mesh-inference.ts`,
  `mesh-file.ts`, `mesh-move.ts`, `mesh-transcribe.ts`,
  `mesh-governance.ts`).
- CONNECTION-ENGINE.md was a 535-line spec for the 4-layer
  connection engine that no longer lives in this repo — every
  paragraph referenced `src/mesh-client.svelte.ts` or
  `mesh-scheduler-worker.ts`, neither of which exists.
- DOCS.md's Cloud Mesh section walked the user through Trystero
  rooms, the legacy on-the-wire `MeshMessage` JSON envelope
  (`infer_request` / `infer_chunk` / `move_offer` / `file_offer`),
  and a config example missing every field the per-network
  schema gained (`label`, `kind`, `topology`, `auto_approve`,
  `auto_gossip`, `agent_permissions`, `prompts`).
- PROGRESS.md was a historical bug-fix doc for a Trystero
  subscription-state quirk that no longer applies — the engine
  isn't here anymore.

What this commit changes:

**README.md**: replace Trystero claim with the bundled
`myownmesh` daemon model; correct the agent-permissions storage
path to the per-network shape (`Config.cloud_mesh.networks[*].
agent_permissions`) and mention the `auto_gossip` gate.

**ARCHITECTURE.md**: rewrite the one-picture diagram to show
the daemon sidecar alongside Ollama; rewrite the mesh intro
paragraph; rewrite the `mesh/` Rust module row to describe
`daemon.rs`, `daemon_commands.rs`, the detect-and-share socket
order, and the relationship to `myownmesh_core`; rewrite the
TS module table to list every `mesh-*.ts` file actually in the
tree with its current role; refresh the CloudMesh sub-tab
inventory (Status / Settings / Connections / Graph / Governance
/ Activity / HTTP); refresh the persistence section to show
`daemon.sock` + the per-network config layout.

**CONNECTION-ENGINE.md**: rewrite as a short pointer. The
4-layer engine + 7-tier reconnect ladder live in MyOwnMesh now;
this doc explains what the LLM still owns on top (the layer-4
LLM-specific protocol), how the LLM talks to the daemon
(detect-and-share IPC), and lists the LLM-side RPC methods +
typed channels currently in use (`infer`, `transcribe`,
`file_offer` / `file_send` + `file_chunks/<id>`, `session_*` /
`move_*`, `catalog/announce`, `permissions/snapshot`,
`prompts/snapshot`).

**DOCS.md Cloud Mesh section**: replace the Trystero transport
paragraph with the daemon's detect-and-share model; refresh
every What-the-mesh-does-for-you row to match current behavior
(click-to-open, click-through Pull, file transfer wire shape,
permissions+prompts gossip with the auto_gossip gate, Graph
view, Governance view, no Phase-1/Phase-2 split); replace the
JSON-over-data-channel wire-protocol box with the daemon
RPC + typed-channel surface; refresh the example config to
include `label`, `kind`, `topology`, `auto_approve`,
`auto_gossip`, `agent_permissions`, `prompts`.

**PROGRESS.md**: deleted. The Trystero subscription-state bug
it documents doesn't apply post-daemon. Two `// see PROGRESS.md`
breadcrumbs in `src-tauri/src/asr/mod.rs` and
`src-tauri/src/diarize/cluster.rs` updated to free-standing
explanations.

Validation:
- `pnpm run check`: 164 files, 0 errors, 0 warnings.
- `grep -rn "Trystero\|trystero\|mesh-client\.svelte" --include="*.md" .`
  returns nothing.
- `grep -rn "PROGRESS.md" .` returns nothing.

https://claude.ai/code/session_01RLu1LdTgtxEDdzhybzqFrk

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants