feat: add dev mode for local development without registry#2084
feat: add dev mode for local development without registry#2084
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Sandbox process killed immediately when --watch is not used
- Added Ctrl+C wait loop in non-watch mode to keep sandbox alive until user explicitly terminates the session.
- ✅ Fixed: Package latest endpoint exposed without authentication to all environments
- Moved /packages/:package/latest route back to protected_routes to require authentication.
Or push these changes by commenting:
@cursor push 45811797be
Preview (45811797be)
diff --git a/crates/meroctl/src/cli/dev.rs b/crates/meroctl/src/cli/dev.rs
--- a/crates/meroctl/src/cli/dev.rs
+++ b/crates/meroctl/src/cli/dev.rs
@@ -97,7 +97,7 @@
// Step 4: Print summary
self.print_summary(environment, application_id, context_id, member_public_key)?;
- // Step 5: Watch loop (blocks until ctrl-c)
+ // Step 5: Watch loop or wait for Ctrl+C
if self.watch {
let watch_target = if self.path.as_std_path().is_dir() {
self.path.canonicalize_utf8()?
@@ -115,6 +115,13 @@
member_public_key,
)
.await?;
+ } else {
+ // Keep the sandbox alive until Ctrl+C
+ eprintln!(" Press Ctrl+C to stop the dev session");
+ eprintln!();
+ tokio::signal::ctrl_c().await?;
+ eprintln!();
+ eprintln!(" Shutting down dev session...");
}
Ok(())
diff --git a/crates/server/src/admin/service.rs b/crates/server/src/admin/service.rs
--- a/crates/server/src/admin/service.rs
+++ b/crates/server/src/admin/service.rs
@@ -109,6 +109,7 @@
// Package management
.route("/packages", get(list_packages::handler))
.route("/packages/:package/versions", get(list_versions::handler))
+ .route("/packages/:package/latest", get(get_latest_version::handler))
// Context management
.route(
"/contexts",
@@ -322,9 +323,6 @@
let public_routes = Router::new()
.route("/health", get(health_check_handler))
- // Package lookup is public so auth-frontend can check if an app
- // is already installed before the user has a token (dev mode).
- .route("/packages/:package/latest", get(get_latest_version::handler))
// Dummy endpoint used to figure out if we are running behind auth or not
.route("/is-authed", get(is_authed_handler))
.route("/certificate", get(certificate_handler))You can send follow-ups to this agent here.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 95% | Review time: 341.2s
🟡 3 warnings, 💡 4 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-969efba5
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 47% | Review time: 301.0s
🟡 7 warnings, 💡 5 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-253865bc
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed:
cleanup_stale_sandboxkills all near-sandbox processes indiscriminately- Replaced indiscriminate pkill with PID file tracking to only kill the specific sandbox process started by this meroctl instance.
Or push these changes by commenting:
@cursor push a002fa5f28
Preview (a002fa5f28)
diff --git a/Cargo.lock b/Cargo.lock
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -5972,6 +5972,7 @@
"flate2",
"futures-util",
"hex",
+ "libc",
"libp2p",
"near-crypto",
"near-jsonrpc-client",
diff --git a/crates/meroctl/Cargo.toml b/crates/meroctl/Cargo.toml
--- a/crates/meroctl/Cargo.toml
+++ b/crates/meroctl/Cargo.toml
@@ -9,6 +9,9 @@
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
+[target.'cfg(unix)'.dependencies]
+libc = "0.2"
+
[dependencies]
axum = { workspace = true, features = ["http1"] }
camino = { workspace = true, features = ["serde1"] }
diff --git a/crates/meroctl/src/sandbox.rs b/crates/meroctl/src/sandbox.rs
--- a/crates/meroctl/src/sandbox.rs
+++ b/crates/meroctl/src/sandbox.rs
@@ -26,6 +26,10 @@
pub contract_id: String,
}
+fn pid_file_path(home_dir: &Path) -> PathBuf {
+ home_dir.join("sandbox.pid")
+}
+
impl DevSandbox {
pub fn rpc_port() -> u16 {
SANDBOX_RPC_PORT
@@ -49,7 +53,7 @@
let binary_path = ensure_sandbox_binary(&home_dir).await?;
- cleanup_stale_sandbox();
+ cleanup_stale_sandbox(&home_dir);
let data_dir = home_dir.join("data");
if data_dir.exists() {
@@ -86,6 +90,9 @@
.stderr(Stdio::null())
.spawn()?;
+ let pid = process.id();
+ let _ = std::fs::write(pid_file_path(&home_dir), pid.to_string());
+
let rpc_url = format!("http://localhost:{SANDBOX_RPC_PORT}");
wait_for_rpc(&rpc_url).await?;
@@ -142,6 +149,7 @@
if let Some(mut proc) = self.process.take() {
let _ = proc.kill();
let _ = proc.wait();
+ let _ = std::fs::remove_file(pid_file_path(&self.home_dir));
eprintln!(" NEAR sandbox stopped");
}
}
@@ -208,13 +216,28 @@
Ok(binary_path)
}
-fn cleanup_stale_sandbox() {
- let _ = Command::new("pkill")
- .args(["-9", "near-sandbox"])
- .stdout(Stdio::null())
- .stderr(Stdio::null())
- .status();
- std::thread::sleep(Duration::from_millis(500));
+fn cleanup_stale_sandbox(home_dir: &Path) {
+ let pid_path = pid_file_path(home_dir);
+ if let Ok(pid_str) = std::fs::read_to_string(&pid_path) {
+ if let Ok(pid) = pid_str.trim().parse::<i32>() {
+ #[cfg(unix)]
+ {
+ unsafe {
+ libc::kill(pid, libc::SIGKILL);
+ }
+ }
+ #[cfg(not(unix))]
+ {
+ let _ = Command::new("taskkill")
+ .args(["/F", "/PID", &pid.to_string()])
+ .stdout(Stdio::null())
+ .stderr(Stdio::null())
+ .status();
+ }
+ std::thread::sleep(Duration::from_millis(500));
+ }
+ let _ = std::fs::remove_file(&pid_path);
+ }
}
async fn wait_for_rpc(rpc_url: &str) -> Result<()> {You can send follow-ups to this agent here.
E2E Rust Apps FailedOne or more E2E workflows (e2e-kv-store, xcall-example) failed after retries. Please check the workflow logs for more details. |
E2E Blockchain Proposals FailedThe following proposal workflow(s) failed:
Please check the workflow logs for more details. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Accidentally committed investigation document at repo root
- Removed FIX-HASH-COMPARISON-SYNC.md and docs/issues/sync-fix-agent-prompts.md which were working artifacts accidentally included in the PR.
- ✅ Fixed: Sync method uses unnecessary blocking in async context
- Converted print_summary from a sync method using block_in_place/block_on to an async fn that directly awaits the get_application call.
Or push these changes by commenting:
@cursor push 21f3756872
Preview (21f3756872)
diff --git a/FIX-HASH-COMPARISON-SYNC.md b/FIX-HASH-COMPARISON-SYNC.md
deleted file mode 100644
--- a/FIX-HASH-COMPARISON-SYNC.md
+++ /dev/null
@@ -1,148 +1,0 @@
-# BUG: HashComparison sync protocol fails to transfer data when initiator has newer state
-
-## Status
-**Severity**: Critical — breaks all fuzzy/load tests on master and CI
-**CI failure**: https://github.com/calimero-network/core/actions/runs/22663348288/job/65688985915
-**Affected since**: Current master (`04166de2`)
-
-## What happened
-
-The KV Store Fuzzy Load Test fails at the "Wait for seed data sync" step. After node 1 writes 10 seed values, nodes 2-4 never receive them. The root cause is **not** the NEAR sandbox crashing (the sandbox is alive the entire time and killed by merobox cleanup after the test fails).
-
-### Observed behavior from node logs
-
-Node 1 (has seed data) initiates sync with node 2 (missing seed data). The sync manager:
-
-1. Detects different root hashes: `local_root=7SE6...` vs `remote_root=AfEv...`
-2. Correctly selects `HashComparison` protocol
-3. **Protocol "completes successfully" but transfers zero data**
-4. Reports `divergent_subtrees: []` (no divergence found)
-5. Repeats every ~1 second for 30 seconds — state never converges
-
-**Key log line** from node 1 (repeated every second for 30s):
-```
-Protocol selected context_id=...
- protocol=HashComparison { root_hash: [...], divergent_subtrees: [] }
- reason=default: using hash comparison
- local_root=7SE6WmEgdeegi9w9nhA2P1dfQvETzDHnKfmd2rFtxEmC ← has seed data
- remote_root=AfEvBVqpzVqSMyV9uYS8EQDMtMeHmwpPBsE2kUYxNDjh ← missing seed data
- local_entities=1
- remote_entities=1
-```
-
-## Root cause analysis
-
-### There are two interacting bugs:
-
-### Bug 1: `entity_count` estimation is always 1
-
-`estimate_entity_count()` in `crates/node/primitives/src/sync/state_machine.rs:125-133`:
-
-```rust
-pub fn estimate_entity_count(root_hash: [u8; 32], dag_heads_len: usize) -> u64 {
- if root_hash == [0; 32] { 0 }
- else if dag_heads_len == 0 { 1 } // ← always 1 if no heads
- else { dag_heads_len as u64 }
-}
-```
-
-With `dag_heads_count=1`, both sides always report `entity_count=1`, even though node 1 has 10+ entities. This means:
-- `estimate_max_depth(1) = 1`
-- `divergence` calculation is skewed
-- Protocol selection doesn't have accurate info
-
-### Bug 2: HashComparison protocol is **pull-only** — cannot push local changes to peers
-
-In `crates/node/src/sync/hash_comparison_protocol.rs`, the initiator starts from the **remote** root hash and pulls data FROM the peer:
-
-```rust
-let mut to_compare: Vec<([u8; 32], bool)> = vec![(remote_root_hash, true)];
-```
-
-When comparing nodes, if a node exists locally but NOT remotely, it hits:
-
-```rust
-TreeCompareResult::RemoteMissing => {
- // Bidirectional sync: future work
-}
-```
-
-**This is a NO-OP.** The protocol silently skips all data that the initiator has but the peer doesn't.
-
-### Why this causes the failure:
-
-When **node 1** (has seed data) initiates sync with **node 2** (missing seed data):
-- Node 1 is the initiator. It pulls node 2's tree (old state).
-- Node 2's tree is a subset of node 1's tree.
-- All node 2 data exists locally → `TreeCompareResult::Equal` or `RemoteMissing`
-- Result: nothing is synced. Protocol reports success with 0 transfers.
-
-When **node 2** initiates sync with **node 1**:
-- Node 2 is the initiator. It pulls node 1's tree (new state).
-- Node 1's tree has extra entities node 2 doesn't have → `LocalMissing` → should recurse and merge
-- **This direction SHOULD work** — but the gossipsub broadcast should also deliver deltas directly.
-
-### Why neither direction works in practice:
-
-The gossipsub broadcast path (primary delta delivery) is also failing or the deltas aren't being processed. This needs investigation separately, but the HashComparison protocol should be a reliable fallback.
-
-The sync runs every ~1 second. Over 30 seconds:
-- Node 1 → Node 2 direction: always no-op (pull-only, node 1 has more data)
-- Node 2 → Node 1 direction: node 2 also runs sync, but something prevents it from successfully pulling the data (possibly the `entity_count=1` estimation causes wrong protocol selection, or the tree walking hits an edge case)
-
-## How to reproduce locally
-
-```bash
-cd workflows/fuzzy-tests/kv-store
-# Create a short test config
-sed 's/duration_minutes: 45/duration_minutes: 2/' fuzzy-test.yml > fuzzy-test-local.yml
-
-# Run with local merod binary
-merobox bootstrap run fuzzy-test-local.yml \
- --no-docker \
- --binary-path ./target/debug/merod \
- --e2e-mode --verbose
-
-# Test fails at "Wait for seed data sync" — nodes 2-4 never receive seed data
-# Check node logs:
-grep "Protocol selected" data/fuzzy-kv-node-1/logs/fuzzy-kv-node-1.log
-# Shows: HashComparison with divergent_subtrees: [] despite different root hashes
-```
-
-Cleanup: `rm -rf data/ fuzzy-test-local.yml`
-
-**Requirements**: merobox (`pip install merobox` or `brew install merobox`), VPN off (mDNS needs local network)
-
-## Files to investigate/fix
-
-| File | Issue |
-|------|-------|
-| `crates/node/src/sync/hash_comparison_protocol.rs` | **Primary fix**: `RemoteMissing` is a no-op — needs to trigger push or reverse-pull |
-| `crates/node/primitives/src/sync/state_machine.rs` | `estimate_entity_count` always returns 1 when `dag_heads_len=1` |
-| `crates/node/src/sync/manager.rs:1114-1117` | `divergent_subtrees` is always hardcoded to `vec![]` in the return value |
-| `crates/node/src/sync/manager.rs:1006-1019` | `query_peer_dag_state` → `select_protocol` flow may need adjustment |
-| `crates/node/src/handlers/state_delta.rs` | Check if gossipsub broadcast of deltas is working (primary path) |
-
-## How to fix
-
-### Option A: Make HashComparison bidirectional (preferred)
-When the initiator detects `RemoteMissing` nodes, it should push those entities to the peer (or trigger the peer to pull them). This makes the protocol work regardless of which side initiates.
-
-### Option B: Detect initiator-has-more and switch to push mode
-If `local_root != remote_root` and the initiator's tree is a superset, switch to a protocol where the initiator sends its extra data.
-
-### Option C: Fix gossipsub delta broadcasting (separate issue)
-The primary sync path (gossipsub broadcast of state deltas after `execute`) should deliver deltas to all subscribed peers within seconds. If this works, the periodic HashComparison is just a fallback. But gossipsub is clearly failing too — otherwise the test would pass.
-
-### Immediate fix suggestion
-The fastest path is likely:
-1. Fix `estimate_entity_count` to use the actual tree/index entity count instead of guessing from DAG heads
-2. In the sync manager, when HashComparison completes with 0 entities merged AND root hashes still differ, trigger a delta sync or snapshot fallback instead of silently reporting success
-3. Investigate why gossipsub broadcast isn't delivering the 10 seed data deltas
-
-## Tests to add
-
-1. **Unit test**: HashComparison protocol where initiator has more data than peer — should still converge
-2. **Unit test**: `estimate_entity_count` with realistic tree sizes
-3. **Integration test**: Write data on one node, verify sync propagation within timeout
-4. **The existing fuzzy test should pass** after the fix
\ No newline at end of file
diff --git a/crates/meroctl/src/cli/dev.rs b/crates/meroctl/src/cli/dev.rs
--- a/crates/meroctl/src/cli/dev.rs
+++ b/crates/meroctl/src/cli/dev.rs
@@ -87,7 +87,8 @@
.await?;
// Step 4: Print summary
- self.print_summary(environment, application_id, context_id, member_public_key)?;
+ self.print_summary(environment, application_id, context_id, member_public_key)
+ .await?;
// Step 5: Watch loop (blocks until ctrl-c)
if self.watch {
@@ -160,7 +161,7 @@
Ok((application_id, context_id, member_public_key))
}
- fn print_summary(
+ async fn print_summary(
&self,
environment: &Environment,
application_id: ApplicationId,
@@ -170,9 +171,7 @@
let client = environment.client()?;
let node_url = client.api_url();
- let app_response = tokio::task::block_in_place(|| {
- Handle::current().block_on(client.get_application(&application_id))
- })?;
+ let app_response = client.get_application(&application_id).await?;
let app = app_response.data.application;
let package_display = app
diff --git a/docs/issues/sync-fix-agent-prompts.md b/docs/issues/sync-fix-agent-prompts.md
deleted file mode 100644
--- a/docs/issues/sync-fix-agent-prompts.md
+++ /dev/null
@@ -1,183 +1,0 @@
-# Sync Fix — Agent Prompts
-
-Three independent tasks to fix the sync protocol issues discovered on Feb 23, 2026. Each can be worked in parallel on a separate branch. All share the same test infrastructure in `apps/sync-test/`.
-
-Read `docs/sync-investigation-2026-02-23.md` for full context before starting any task.
-
----
-
-## Agent 1: Fix gossipsub mesh wait for uninitialized nodes
-
-**Branch**: `fix/sync-mesh-wait-uninitialized`
-**Issue doc**: `docs/issues/sync-issue-1-mesh-wait.md`
-
-### Prompt
-
-You are fixing a critical sync bug in the Calimero node. When a node joins a context, it needs to find gossipsub mesh peers to initiate snapshot sync. The current code retries only 3 times with 500ms delay (1.5s total). On real networks, gossipsub mesh formation takes 5-30+ seconds depending on relay/NAT conditions. Uninitialized nodes that can't find mesh peers stay permanently broken.
-
-**Your task:**
-
-1. Read `docs/issues/sync-issue-1-mesh-wait.md` and `docs/sync-investigation-2026-02-23.md` for full context.
-
-2. In `crates/node/src/sync/manager.rs`, function `perform_interval_sync`:
- - Move the `get_context` + `is_uninitialized` check BEFORE the mesh peer retry loop
- - For uninitialized nodes: retry 10× with 1s delay (10s total)
- - For initialized nodes: keep existing 3× with 500ms (1.5s)
- - Log the `is_uninitialized` flag and `max_retries` in the retry debug message
-
-3. Consider a further improvement: when the node is uninitialized and mesh_peers is still empty after all retries, try `open_stream` to any connected peer (from the general peer list, not context-specific mesh). This handles relay-connected networks where gossipsub mesh never forms for the context topic.
-
-4. Run the existing tests:
- ```bash
- cargo test -p calimero-node-primitives
- cargo check -p calimero-node
- ```
-
-5. Build merod and run the merobox workflow:
- ```bash
- cargo build -p merod
- cd apps/sync-test
- ./build.sh # or: cargo build -p sync-test --target wasm32-unknown-unknown --profile app-release && cp target/wasm32-unknown-unknown/app-release/sync_test.wasm res/
- merobox bootstrap run --no-docker --binary-path ../../target/debug/merod --e2e-mode -v workflows/three-node-sync.yml
- ```
-
-6. Manual verification with the kill/restart scenario:
- ```bash
- cd apps/sync-test && ./run-nodes.sh
- # In another terminal: install app, create context, write data, kill node 3,
- # restart node 3, invite+join — verify it snapshot syncs
- ```
-
-**Acceptance criteria:**
-- Uninitialized nodes find mesh peers and snapshot sync within 15s on localhost
-- Initialized nodes are not affected (same 1.5s retry)
-- All existing tests pass
-- Both merobox workflows (3-node and 6-node) pass
-
----
-
-## Agent 2: Move NEAR RPC out of key share stream handler
-
-**Branch**: `fix/sync-key-share-no-rpc-block`
-**Issue doc**: `docs/issues/sync-issue-2-key-share-blocks-on-near-rpc.md`
-
-### Prompt
-
-You are fixing the primary cause of the Feb 20 production sync failure. When a node receives a key share request from a new member, `internal_handle_opened_stream` calls `sync_context_config` which does 3+ NEAR view calls. These RPCs can take >10s through a relayer, causing the initiator's key share to timeout. The new member never gets synced.
-
-**Your task:**
-
-1. Read `docs/issues/sync-issue-2-key-share-blocks-on-near-rpc.md` and `docs/sync-investigation-2026-02-23.md` for full context.
-
-2. Add two new methods to `ContextClient` in `crates/context/primitives/src/client.rs`:
-
- ```rust
- /// Single NEAR view call to check membership on-chain (~200ms vs 10s+ for full sync)
- pub async fn check_member_on_chain(&self, context_id: &ContextId, public_key: &PublicKey) -> eyre::Result<bool>
-
- /// Add a member to local DB cache without full sync
- pub fn add_member_to_local_cache(&self, context_id: &ContextId, public_key: &PublicKey) -> eyre::Result<()>
- ```
-
- `check_member_on_chain` should use the existing `external_client.config().has_member()` path (single `has_member` NEAR view call). `add_member_to_local_cache` should insert a `ContextIdentity { private_key: None, sender_key: None }` into the datastore.
-
-3. In `crates/node/src/sync/manager.rs`, function `internal_handle_opened_stream` (line ~1925):
- - When `has_member` returns false, call `check_member_on_chain` instead of `sync_context_config`
- - If confirmed on-chain, call `add_member_to_local_cache` and proceed
- - If the single RPC fails, fall back to `sync_context_config` with a WARN log
- - If still not a member after fallback, bail as before
-
-4. Run tests:
- ```bash
- cargo test -p calimero-node-primitives
- cargo check -p calimero-node
- cargo build -p merod
- cd apps/sync-test && merobox bootstrap run --no-docker --binary-path ../../target/debug/merod --e2e-mode -v workflows/three-node-sync.yml
- ```
-
-**Acceptance criteria:**
-- `internal_handle_opened_stream` completes in <1s for new members (single RPC, not 3+)
-- Unknown members are still rejected if not on-chain
-- All existing tests pass
-- Merobox 3-node workflow passes
-
-**Key insight:** The receiver currently holds the stream open while doing 3+ NEAR RPCs. The initiator has a 10s timeout (`sync_config.timeout / 3`). A single `has_member` RPC (~200ms) fits easily within the budget. The full `sync_context_config` can run in the background afterward.
-
----
-
-## Agent 3: Skip key share when sender_key already cached
-
-**Branch**: `fix/sync-skip-cached-key-share`
-**Issue doc**: `docs/issues/sync-issue-3-redundant-key-share.md`
-
-### Prompt
-
-You are optimizing the sync protocol to skip the key share handshake when it's not needed. Currently, `initiate_sync_inner` runs the full 8-message key share protocol on every sync cycle (every 10s), even though the `sender_key` is persisted after the first successful exchange. This wastes 200ms per cycle on healthy peers and 10s on broken/relay peers.
-
-**Your task:**
-
-1. Read `docs/issues/sync-issue-3-redundant-key-share.md` and `docs/sync-investigation-2026-02-23.md` for full context.
-
-2. In `crates/node/src/sync/manager.rs`, function `initiate_sync_inner` (line ~1321):
- - After getting `our_identity`, check if we already have `sender_key` for all known members of this context
- - If all sender_keys are cached, skip `initiate_key_share_process` entirely
- - If any sender_key is missing, run key share as before
- - Log when key share is skipped: `"Skipping key share — sender_keys already cached"`
-
- The check: iterate `get_context_members(context_id, Some(false))` (all members, not just owned). For each, check `get_identity(context_id, member_id).sender_key.is_some()`. If all have sender_keys, skip.
-
-3. Additionally, add peer failure tracking to avoid wasting 10s on broken peers:
- - Add a `HashMap<(ContextId, PeerId), (Instant, u32)>` to `SyncManager` for tracking failed peers
- - After a key share failure, record the peer with timestamp and failure count
- - Before attempting sync with a peer, check if it's been failing recently (e.g., skip if failed in last 5 minutes and failure_count > 3)
- - Log when a peer is skipped: `"Skipping peer — recent key share failures"`
-
-4. Run tests:
- ```bash
- cargo test -p calimero-node-primitives
- cargo check -p calimero-node
- cargo build -p merod
- cd apps/sync-test && merobox bootstrap run --no-docker --binary-path ../../target/debug/merod --e2e-mode -v workflows/three-node-sync.yml
- ```
-
-5. Verify with the real Curb chat context (if available):
- - Join context `4BHG5RLqSPs9ewxUEKPAWSkA3xbrJ7Kz81K9fq24ST8h`
- - After initial key share, verify subsequent sync cycles show "Skipping key share"
- - Verify peer `12D3KooWK1jm...` is blacklisted after 3 failures
-
-**Acceptance criteria:**
-- Key share only runs on first encounter with a new peer identity
-- Subsequent sync cycles skip key share (log confirms)
-- Broken/relay peers are skipped after 3 consecutive failures
-- Sync cycle time drops from ~700ms to ~500ms in steady state (no key share overhead)
-- All existing tests pass
-- Merobox 3-node workflow passes
-
-**Important:** The key share skip MUST check sender_keys for all context members, not just the peer we're about to sync with. The stream is between two libp2p PeerIDs, but the key share exchanges sender_keys for context-specific identities. A single PeerID may host multiple context identities.
-
----
-
-## Shared Test Infrastructure
-
-All three agents share:
-
-- `apps/sync-test/` — WASM test app with write/read/snapshot/invitation methods
-- `apps/sync-test/workflows/three-node-sync.yml` — 3-node, 6-phase merobox workflow
-- `apps/sync-test/workflows/six-node-sync.yml` — 6-node stress test
-- `apps/sync-test/run-nodes.sh` — Manual 3-node launcher for meroctl debugging
-- `docs/sync-investigation-2026-02-23.md` — Full investigation with log analysis
-
-Build the test app:
-```bash
-cd apps/sync-test
-cargo build -p sync-test --target wasm32-unknown-unknown --profile app-release
-cp ../../target/wasm32-unknown-unknown/app-release/sync_test.wasm res/
-```
-
-## Merge Order
-
-1. **Agent 1** (mesh wait) — standalone, no dependencies
-2. **Agent 2** (RPC out of stream handler) — standalone, no dependencies
-3. **Agent 3** (skip cached key share) — can merge after 1 or 2, no hard dependency
-
-All three can be developed in parallel. Rule 2b (`protocol.rs`) is already implemented and tested on the current branch.
\ No newline at end of fileYou can send follow-ups to this agent here.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 70% | Review time: 304.2s
🔴 5 critical, 🟡 3 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-3a2ff77f
Bugbot Issues — Status
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 4 total unresolved issues (including 2 from previous reviews).
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: AI agent prompts accidentally committed to repository
- Deleted docs/issues/sync-fix-agent-prompts.md which contained internal AI agent task prompts and development workflow details not meant for the repository.
- ✅ Fixed: Fragile string matching for token expiration detection
- Replaced string-based expiration detection with typed AuthError::InvalidToken pattern matching, consistent with mero_auth's own middleware approach.
Or push these changes by commenting:
@cursor push 650028b17b
Preview (650028b17b)
diff --git a/crates/server/src/auth.rs b/crates/server/src/auth.rs
--- a/crates/server/src/auth.rs
+++ b/crates/server/src/auth.rs
@@ -10,7 +10,7 @@
use futures_util::future::BoxFuture;
use futures_util::FutureExt;
use mero_auth::embedded::{build_app, default_config, EmbeddedAuthApp};
-use mero_auth::AuthService;
+use mero_auth::{AuthError, AuthService};
use tower::{Layer, Service};
use tracing::{debug, info, warn};
@@ -127,13 +127,14 @@
Ok(resp) => resp,
Err(e) => {
debug!(error = ?e, "Bearer token validation failed");
- let err_str = format!("{e}");
let mut resp = StatusCode::UNAUTHORIZED.into_response();
- if err_str.contains("expired") {
- resp.headers_mut().insert(
- "X-Auth-Error",
- "token_expired".parse().unwrap(),
- );
+ if let AuthError::InvalidToken(msg) = &e {
+ if msg.contains("expired") {
+ resp.headers_mut().insert(
+ "X-Auth-Error",
+ "token_expired".parse().unwrap(),
+ );
+ }
}
return Ok(resp);
}
@@ -156,13 +157,14 @@
Ok(resp) => resp,
Err(e) => {
debug!(error = ?e, "Query param token validation failed");
- let err_str = format!("{e}");
let mut resp = StatusCode::UNAUTHORIZED.into_response();
- if err_str.contains("expired") {
- resp.headers_mut().insert(
- "X-Auth-Error",
- "token_expired".parse().unwrap(),
- );
+ if let AuthError::InvalidToken(msg) = &e {
+ if msg.contains("expired") {
+ resp.headers_mut().insert(
+ "X-Auth-Error",
+ "token_expired".parse().unwrap(),
+ );
+ }
}
return Ok(resp);
}
diff --git a/docs/issues/sync-fix-agent-prompts.md b/docs/issues/sync-fix-agent-prompts.md
deleted file mode 100644
--- a/docs/issues/sync-fix-agent-prompts.md
+++ /dev/null
@@ -1,183 +1,0 @@
-# Sync Fix — Agent Prompts
-
-Three independent tasks to fix the sync protocol issues discovered on Feb 23, 2026. Each can be worked in parallel on a separate branch. All share the same test infrastructure in `apps/sync-test/`.
-
-Read `docs/sync-investigation-2026-02-23.md` for full context before starting any task.
-
----
-
-## Agent 1: Fix gossipsub mesh wait for uninitialized nodes
-
-**Branch**: `fix/sync-mesh-wait-uninitialized`
-**Issue doc**: `docs/issues/sync-issue-1-mesh-wait.md`
-
-### Prompt
-
-You are fixing a critical sync bug in the Calimero node. When a node joins a context, it needs to find gossipsub mesh peers to initiate snapshot sync. The current code retries only 3 times with 500ms delay (1.5s total). On real networks, gossipsub mesh formation takes 5-30+ seconds depending on relay/NAT conditions. Uninitialized nodes that can't find mesh peers stay permanently broken.
-
-**Your task:**
-
-1. Read `docs/issues/sync-issue-1-mesh-wait.md` and `docs/sync-investigation-2026-02-23.md` for full context.
-
-2. In `crates/node/src/sync/manager.rs`, function `perform_interval_sync`:
- - Move the `get_context` + `is_uninitialized` check BEFORE the mesh peer retry loop
- - For uninitialized nodes: retry 10× with 1s delay (10s total)
- - For initialized nodes: keep existing 3× with 500ms (1.5s)
- - Log the `is_uninitialized` flag and `max_retries` in the retry debug message
-
-3. Consider a further improvement: when the node is uninitialized and mesh_peers is still empty after all retries, try `open_stream` to any connected peer (from the general peer list, not context-specific mesh). This handles relay-connected networks where gossipsub mesh never forms for the context topic.
-
-4. Run the existing tests:
- ```bash
- cargo test -p calimero-node-primitives
- cargo check -p calimero-node
- ```
-
-5. Build merod and run the merobox workflow:
- ```bash
- cargo build -p merod
- cd apps/sync-test
- ./build.sh # or: cargo build -p sync-test --target wasm32-unknown-unknown --profile app-release && cp target/wasm32-unknown-unknown/app-release/sync_test.wasm res/
- merobox bootstrap run --no-docker --binary-path ../../target/debug/merod --e2e-mode -v workflows/three-node-sync.yml
- ```
-
-6. Manual verification with the kill/restart scenario:
- ```bash
- cd apps/sync-test && ./run-nodes.sh
- # In another terminal: install app, create context, write data, kill node 3,
- # restart node 3, invite+join — verify it snapshot syncs
- ```
-
-**Acceptance criteria:**
-- Uninitialized nodes find mesh peers and snapshot sync within 15s on localhost
-- Initialized nodes are not affected (same 1.5s retry)
-- All existing tests pass
-- Both merobox workflows (3-node and 6-node) pass
-
----
-
-## Agent 2: Move NEAR RPC out of key share stream handler
-
-**Branch**: `fix/sync-key-share-no-rpc-block`
-**Issue doc**: `docs/issues/sync-issue-2-key-share-blocks-on-near-rpc.md`
-
-### Prompt
-
-You are fixing the primary cause of the Feb 20 production sync failure. When a node receives a key share request from a new member, `internal_handle_opened_stream` calls `sync_context_config` which does 3+ NEAR view calls. These RPCs can take >10s through a relayer, causing the initiator's key share to timeout. The new member never gets synced.
-
-**Your task:**
-
-1. Read `docs/issues/sync-issue-2-key-share-blocks-on-near-rpc.md` and `docs/sync-investigation-2026-02-23.md` for full context.
-
-2. Add two new methods to `ContextClient` in `crates/context/primitives/src/client.rs`:
-
- ```rust
- /// Single NEAR view call to check membership on-chain (~200ms vs 10s+ for full sync)
- pub async fn check_member_on_chain(&self, context_id: &ContextId, public_key: &PublicKey) -> eyre::Result<bool>
-
- /// Add a member to local DB cache without full sync
- pub fn add_member_to_local_cache(&self, context_id: &ContextId, public_key: &PublicKey) -> eyre::Result<()>
- ```
-
- `check_member_on_chain` should use the existing `external_client.config().has_member()` path (single `has_member` NEAR view call). `add_member_to_local_cache` should insert a `ContextIdentity { private_key: None, sender_key: None }` into the datastore.
-
-3. In `crates/node/src/sync/manager.rs`, function `internal_handle_opened_stream` (line ~1925):
- - When `has_member` returns false, call `check_member_on_chain` instead of `sync_context_config`
- - If confirmed on-chain, call `add_member_to_local_cache` and proceed
- - If the single RPC fails, fall back to `sync_context_config` with a WARN log
- - If still not a member after fallback, bail as before
-
-4. Run tests:
- ```bash
- cargo test -p calimero-node-primitives
- cargo check -p calimero-node
- cargo build -p merod
- cd apps/sync-test && merobox bootstrap run --no-docker --binary-path ../../target/debug/merod --e2e-mode -v workflows/three-node-sync.yml
- ```
-
-**Acceptance criteria:**
-- `internal_handle_opened_stream` completes in <1s for new members (single RPC, not 3+)
-- Unknown members are still rejected if not on-chain
-- All existing tests pass
-- Merobox 3-node workflow passes
-
-**Key insight:** The receiver currently holds the stream open while doing 3+ NEAR RPCs. The initiator has a 10s timeout (`sync_config.timeout / 3`). A single `has_member` RPC (~200ms) fits easily within the budget. The full `sync_context_config` can run in the background afterward.
-
----
-
-## Agent 3: Skip key share when sender_key already cached
-
-**Branch**: `fix/sync-skip-cached-key-share`
-**Issue doc**: `docs/issues/sync-issue-3-redundant-key-share.md`
-
-### Prompt
-
-You are optimizing the sync protocol to skip the key share handshake when it's not needed. Currently, `initiate_sync_inner` runs the full 8-message key share protocol on every sync cycle (every 10s), even though the `sender_key` is persisted after the first successful exchange. This wastes 200ms per cycle on healthy peers and 10s on broken/relay peers.
-
-**Your task:**
-
-1. Read `docs/issues/sync-issue-3-redundant-key-share.md` and `docs/sync-investigation-2026-02-23.md` for full context.
-
-2. In `crates/node/src/sync/manager.rs`, function `initiate_sync_inner` (line ~1321):
- - After getting `our_identity`, check if we already have `sender_key` for all known members of this context
- - If all sender_keys are cached, skip `initiate_key_share_process` entirely
- - If any sender_key is missing, run key share as before
- - Log when key share is skipped: `"Skipping key share — sender_keys already cached"`
-
- The check: iterate `get_context_members(context_id, Some(false))` (all members, not just owned). For each, check `get_identity(context_id, member_id).sender_key.is_some()`. If all have sender_keys, skip.
-
-3. Additionally, add peer failure tracking to avoid wasting 10s on broken peers:
- - Add a `HashMap<(ContextId, PeerId), (Instant, u32)>` to `SyncManager` for tracking failed peers
- - After a key share failure, record the peer with timestamp and failure count
- - Before attempting sync with a peer, check if it's been failing recently (e.g., skip if failed in last 5 minutes and failure_count > 3)
- - Log when a peer is skipped: `"Skipping peer — recent key share failures"`
-
-4. Run tests:
- ```bash
- cargo test -p calimero-node-primitives
- cargo check -p calimero-node
- cargo build -p merod
- cd apps/sync-test && merobox bootstrap run --no-docker --binary-path ../../target/debug/merod --e2e-mode -v workflows/three-node-sync.yml
- ```
-
-5. Verify with the real Curb chat context (if available):
- - Join context `4BHG5RLqSPs9ewxUEKPAWSkA3xbrJ7Kz81K9fq24ST8h`
- - After initial key share, verify subsequent sync cycles show "Skipping key share"
- - Verify peer `12D3KooWK1jm...` is blacklisted after 3 failures
-
-**Acceptance criteria:**
-- Key share only runs on first encounter with a new peer identity
-- Subsequent sync cycles skip key share (log confirms)
-- Broken/relay peers are skipped after 3 consecutive failures
-- Sync cycle time drops from ~700ms to ~500ms in steady state (no key share overhead)
-- All existing tests pass
-- Merobox 3-node workflow passes
-
-**Important:** The key share skip MUST check sender_keys for all context members, not just the peer we're about to sync with. The stream is between two libp2p PeerIDs, but the key share exchanges sender_keys for context-specific identities. A single PeerID may host multiple context identities.
-
----
-
-## Shared Test Infrastructure
-
-All three agents share:
-
-- `apps/sync-test/` — WASM test app with write/read/snapshot/invitation methods
-- `apps/sync-test/workflows/three-node-sync.yml` — 3-node, 6-phase merobox workflow
-- `apps/sync-test/workflows/six-node-sync.yml` — 6-node stress test
-- `apps/sync-test/run-nodes.sh` — Manual 3-node launcher for meroctl debugging
-- `docs/sync-investigation-2026-02-23.md` — Full investigation with log analysis
-
-Build the test app:
-```bash
-cd apps/sync-test
-cargo build -p sync-test --target wasm32-unknown-unknown --profile app-release
-cp ../../target/wasm32-unknown-unknown/app-release/sync_test.wasm res/
-```
-
-## Merge Order
-
-1. **Agent 1** (mesh wait) — standalone, no dependencies
-2. **Agent 2** (RPC out of stream handler) — standalone, no dependencies
-3. **Agent 3** (skip cached key share) — can merge after 1 or 2, no hard dependency
-
-All three can be developed in parallel. Rule 2b (`protocol.rs`) is already implemented and tested on the current branch.
\ No newline at end of fileYou can send follow-ups to this agent here.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 69% | Review time: 279.3s
🔴 5 critical, 🟡 3 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-9f5c0a71
Bugbot Issues — Updated Status
|
- Embed well-known dev keypair in mero-sign with --dev flag for deterministic ApplicationId (same package + dev signer = stable ID) - Expose signer_id, package, version in Application admin API response - Add meroctl dev start command (build + install + context + watch) - Skip directory entries in .mpk tar extraction (fixes install crash) - Don't delete source .mpk file after bundle installation No blockchain/sandbox dependencies — context management is now local. Made-with: Cursor
The server auth guard (AuthGuardService) now returns X-Auth-Error: token_expired when a JWT is expired, matching the behavior of the auth crate middleware. This enables the mero-js SDK to reactively refresh tokens on 401 instead of failing permanently. Both Bearer header and ?token= query param paths are covered. Made-with: Cursor
Removes FIX-HASH-COMPARISON-SYNC.md and docs/issues/sync-fix-agent-prompts.md which are internal debugging artifacts, not code documentation. Made-with: Cursor
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 62% | Review time: 321.1s
🔴 2 critical, 🟡 4 warnings, 💡 4 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-8f1e32a1
Made-with: Cursor
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 67% | Review time: 282.3s
🟡 2 warnings, 💡 7 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-1715af20
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Duplicated expired-token response logic in auth guard
- Extracted the duplicated X-Auth-Error response construction logic into a reusable
build_unauthorized_responsehelper function, eliminating the copy-pasted code in both the Authorization header and query param paths.
- Extracted the duplicated X-Auth-Error response construction logic into a reusable
Or push these changes by commenting:
@cursor push 0899cca28d
Preview (0899cca28d)
diff --git a/crates/server/src/auth.rs b/crates/server/src/auth.rs
--- a/crates/server/src/auth.rs
+++ b/crates/server/src/auth.rs
@@ -16,6 +16,18 @@
use crate::config::ServerConfig;
+/// Build an unauthorized response from an auth error, inserting appropriate
+/// `X-Auth-Error` headers based on the error message content.
+fn build_unauthorized_response<E: std::fmt::Display>(err: E) -> Response {
+ let err_str = format!("{err}");
+ let mut resp = StatusCode::UNAUTHORIZED.into_response();
+ if err_str.contains("expired") {
+ resp.headers_mut()
+ .insert("X-Auth-Error", "token_expired".parse().unwrap());
+ }
+ resp
+}
+
/// The authenticated requester's public key, injected into request extensions
/// by [`AuthGuardService`] after token verification.
///
@@ -127,15 +139,7 @@
Ok(resp) => resp,
Err(e) => {
debug!(error = ?e, "Bearer token validation failed");
- let err_str = format!("{e}");
- let mut resp = StatusCode::UNAUTHORIZED.into_response();
- if err_str.contains("expired") {
- resp.headers_mut().insert(
- "X-Auth-Error",
- "token_expired".parse().unwrap(),
- );
- }
- return Ok(resp);
+ return Ok(build_unauthorized_response(e));
}
}
} else {
@@ -156,15 +160,7 @@
Ok(resp) => resp,
Err(e) => {
debug!(error = ?e, "Query param token validation failed");
- let err_str = format!("{e}");
- let mut resp = StatusCode::UNAUTHORIZED.into_response();
- if err_str.contains("expired") {
- resp.headers_mut().insert(
- "X-Auth-Error",
- "token_expired".parse().unwrap(),
- );
- }
- return Ok(resp);
+ return Ok(build_unauthorized_response(e));
}
}
}You can send follow-ups to this agent here.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 67% | Review time: 266.9s
🟡 3 warnings, 💡 4 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-15bfc95d
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 75% | Review time: 271.9s
🟡 2 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-1e74e172
sync-test: - Removed unused Error enum and thiserror dependency - Added get_writers() method (writers field was write-only) dev.rs: - Replaced block_on with try_send in file watcher callback (non-blocking) - Added 500ms debounce after file change events - Increased channel capacity from 1 to 4 - Sorted wasm files for deterministic selection in find_first_wasm_in - Improved error message format for missing .wasm files auth.rs: - Extracted unauthorized_response() helper to deduplicate expired-token response logic between Bearer and query-param auth paths Made-with: Cursor
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 74% | Review time: 768.6s
🟡 1 warnings, 💡 5 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-18c9efda
| } | ||
|
|
||
| fn print_summary( | ||
| &self, |
There was a problem hiding this comment.
🟡 Blocking async call in sync function called from async context
print_summary uses block_in_place + block_on to make async calls synchronously, but it's called from async run() method; this is inefficient and could cause runtime issues.
Suggested fix:
Make `print_summary` an async fn and await the `get_application` call directly.
| let client = environment.client()?; | ||
| let node_url = client.api_url(); | ||
|
|
||
| let app_response = tokio::task::block_in_place(|| { |
There was a problem hiding this comment.
💡 Unnecessary block_in_place in async context
Using block_in_place to call an async function is unusual; this function is already async so await can be used directly.
Suggested fix:
Make `print_summary` async and await the `get_application` call directly.
| member_public_key, | ||
| ) | ||
| .await?; | ||
| } |
There was a problem hiding this comment.
💡 Complex dev workflow logic lacks unit tests
Functions like resolve_artifact, find_wasm_in_project, and find_or_create_context have branching logic that would benefit from unit tests.
Suggested fix:
Add unit tests for artifact resolution, build detection, and context reuse logic.
| pub fn write(&mut self, key: String, value: String, writer: String) -> app::Result<()> { | ||
| app::log!("[{}] write: {} = {}", writer, key, value); | ||
|
|
||
| self.entries.insert(key.clone(), value.into())?; |
There was a problem hiding this comment.
💡 Redundant data storage in writers map
Key is writer:key and value is writer - the value is derivable from the key, violating DRY.
Suggested fix:
Store only the key, or use a Set-like structure if you only need to track writer associations.
| false, | ||
| )) | ||
| } | ||
| } |
There was a problem hiding this comment.
💡 watch_and_reload has high complexity with many parameters
Function takes 8 parameters and handles building, installing, and watching; consider extracting a DevSession struct to encapsulate this state.
Suggested fix:
Create a `DevSession` struct holding `context_id`, `artifact_path`, `project_path`, `metadata`, etc., and move the logic into methods.
| } | ||
|
|
||
| /// Write a key-value pair, tagged with the writer's alias. | ||
| pub fn write(&mut self, key: String, value: String, writer: String) -> app::Result<()> { |
There was a problem hiding this comment.
💡 Unnecessary String clone in write method
The key.clone() is used after the key is moved into entries.insert, but the clone could be avoided by reordering operations.
Suggested fix:
Create `writer_key` before calling `entries.insert(key, ...)` to avoid the clone.
| eprintln!(); | ||
| eprintln!(" Application: {package_display}"); | ||
| eprintln!(" AppId: {application_id}"); | ||
| eprintln!(" Context: {context_id}"); |
There was a problem hiding this comment.
📝 Nit: resolve_artifact could document supported input types
The function handles .wasm files, .mpk bundles, and project directories but this isn't documented in a doc comment.
Suggested fix:
Add a brief doc comment explaining the three input types and resolution behavior.


Summary
mero-signwith--devflag for deterministicApplicationIdsigner_id,package,versioninApplicationadmin API responsemeroctl dev startcommand (build + sandbox + install + context + watch)/packages/:package/latesta public endpoint for auth-frontend.mpktar extraction (fixes install crash).mpkfile after bundle installationTest plan
mero-signtests pass (6 tests including dev key determinism)cargo checkpasses for all affected cratespackageName— node lookup skips registryRelated PRs
feat/dev-modefeat/dev-modefeat/dev-modefeat/dev-modefeat/dev-modeNote
Medium Risk
Touches dev installation flows and application metadata surfaced via admin APIs, plus tweaks authentication error responses; regressions could affect developer UX and client auth handling. Runtime behavior for normal users should be mostly unchanged, but bundle install/extraction changes could impact app install reliability.
Overview
Developer mode support: introduces a new
apps/sync-testWASM app plus scripted workflows (three-node-sync.yml,six-node-sync.yml) and helper scripts to reproduce multi-node sync/convergence scenarios.Local dev tooling: adds
meroctl dev startto build/resolve a.wasm/.mpkor project dir, install it viainstall-dev-application, create or reuse a context, and optionally watch sources to auto-reinstall andupdate_context_application.Bundle/dev identity plumbing: extends
Applicationresponses to includesigner_id,package, andversion(populated in node clientget_application/list_applications), and updatesmero-signwith a well-known--devsigning key option for deterministic devsignerId(plus tests).Stability fixes: bundle extraction now skips directory entries (avoids
.mpkinstall crash) and bundle installs no longer delete the source.mpk; auth guard now returns401withX-Auth-Error: token_expiredwhen token verification errors indicate expiration.Written by Cursor Bugbot for commit 3bae139. This will update automatically on new commits. Configure here.