Skip to content

feat(storage): migrate to iii-sdk 0.19 with config-worker integration + hardened reload#236

Merged
andersonleal merged 1 commit into
mainfrom
storage-sdk-config-integration
Jun 9, 2026
Merged

feat(storage): migrate to iii-sdk 0.19 with config-worker integration + hardened reload#236
andersonleal merged 1 commit into
mainfrom
storage-sdk-config-integration

Conversation

@andersonleal

@andersonleal andersonleal commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Migrate the storage worker to iii-sdk 0.19.0 and adopt the configuration-worker pattern (mirroring the database worker), then harden the config-reload path against two adversarial-review findings.

Migration

  • Bump iii-sdk =0.16.0-next.2=0.19.0 (drop-in — verified against every call site).
  • Register a config schema with the configuration worker, fetch live config over RPC, and hot-reload backends on configuration:updated.
  • Register the four storage::* functions inline in main.rs (drop the register_all helper).
  • --config becomes an optional seed (used only for initial_value on first register); the configuration worker is the live source of truth.
  • manifest.rs / --manifest, the rustfs sidecar, webhook receiver, and SQS/Pub-Sub/CF-Queue pollers are preserved. No iii-observability added.

Security hardening (storage + database)

  • *::on-config-change now re-fetches the authoritative config via configuration::get and never trusts the trigger payload — closes an unauthenticated reconfiguration vector on a discoverable bus function.
  • storage refuses hot-reloads that change bucket/notification topology (WorkerConfig::topology() compared in reloadable()), preventing a backend/notification split-brain. Only backend-connection changes (credentials, endpoint, path-style) hot-apply; topology changes require a restart (logged).
  • iii-permissions.yaml denies agent invocation of the internal *::on-config-change functions (defense-in-depth; does not affect engine trigger delivery).

Test plan

  • storage: cargo build, cargo test (91 lib + integration put_then_get_round_trips_via_rustfs + manifest + schema), cargo clippy --all-targets -- -D warnings — green
  • database: cargo build, cargo test (198 lib + 6 integration), cargo clippy --all-targets -- -D warnings — green
  • CI green
  • Reviewer confirms topology() covers all boot-pinned wiring dimensions and the NotificationKey::CfQueue api_token inclusion (CF-Queue poller is boot-pinned to it — restart required on rotation) is acceptable

Notes

  • The database security fix rides this branch but is logically separable (database was not part of the storage migration) — can be split into its own PR if preferred.

Summary by CodeRabbit

  • New Features

    • Added JSON configuration support with automatic schema generation for validation.
    • Implemented hot-reload capability for configuration changes without worker restart.
    • Added topology validation to prevent configuration updates that would alter bucket structure.
  • Documentation

    • Updated configuration guide with live configuration flow and local development setup instructions.
  • Security

    • Added deny rules to prevent unauthorized access to internal configuration reload handlers.
  • Chores

    • Updated dependencies to latest versions.

… + hardened reload

Bump storage to iii-sdk 0.19.0 and adopt the configuration-worker pattern
(mirroring the database worker): register a config schema, fetch live config
over RPC, hot-reload backends on change, and register the storage::* functions
inline. --config becomes an optional seed; manifest.rs and the rustfs
sidecar / webhook receiver / SQS-PubSub-CF pollers are preserved.

Config-reload security hardening (storage + database), fixing two adversarial
-review findings:
- on-config-change re-fetches the authoritative config via configuration::get
  and never trusts the trigger payload, closing an unauthenticated
  reconfiguration vector on a discoverable bus function.
- storage refuses hot-reloads that change bucket/notification topology
  (WorkerConfig::topology signature compared in reloadable()), preventing a
  backend/notification split-brain; only backend-connection changes
  (credentials, endpoint, path-style) hot-apply.
- iii-permissions.yaml denies agent invocation of the internal
  *::on-config-change functions as defense-in-depth.

Verified: storage 91 lib + integration + clippy -D warnings; database 198 lib
+ 6 integration + clippy -D warnings, all green.
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workers Ready Ready Preview, Comment Jun 8, 2026 9:22pm

Request Review

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

skill-check — worker

0 verified, 14 skipped (no docs/).

Layer Result
structure
vale
ai
render

Four for four. Nicely done.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Storage and database workers now delegate configuration management to a centralized configuration worker. Configuration types gain serialization and topology modeling; storage backends are protected by RwLock for concurrent access during hot-reload; and request handlers await async backend lookup. Configuration changes trigger handlers that refetch authoritative config, validate topology immutability, and rebuild backends atomically.

Changes

Configuration-Driven Backend Hot-Reload

Layer / File(s) Summary
Configuration type modeling and serialization
storage/src/config.rs, storage/src/backend/factory.rs, tests/...
Topology, BucketTopology, and NotificationKey types model provider, underlying bucket name, and notification identity. Config structs derive Serialize and JsonSchema enabling JSON round-tripping. WorkerConfig adds from_json, to_json, topology, and json_schema methods. LocalBackendCtx derives Clone.
AppState refactoring for concurrent backend access
storage/src/handlers/mod.rs
AppState.backends is changed to Arc<RwLock<HashMap<...>>>. The backend method becomes async, acquiring a read lock and returning a cloned Arc<dyn Backend>. The register_all entrypoint and per-RPC registration helpers are removed.
Configuration integration module and hot-reload logic
storage/src/configuration.rs, storage/src/lib.rs
New module registers storage configuration schema with the configuration worker (optionally seeding), fetches authoritative stored configuration with defaults fallback, and provides hot-reload via apply_config (atomic backend map replacement under write lock). register_config_trigger binds an async handler to configuration update events; the handler refetches config, validates topology reloadability (rejecting bucket/notification topology changes), and applies rebuilt backends. Retry logic handles configuration RPC calls.
Handler updates for async backend access
storage/src/handlers/{put_object,get_object,delete_object,presign_url}.rs, tests/...
All four request handlers await the async state.backend(&bucket) call before mapping errors. Test helpers wrap backends in Arc<RwLock<...>> and initialize local_ctx: None.
Main worker initialization and RPC/trigger registration
storage/src/main.rs
CLI config becomes optional seed path. Startup conditionally loads seed config, calls configuration::register_config (with optional seed), then configuration::fetch_config to retrieve authoritative config. Backend initialization uses configuration::build_backends and extends AppState with local_ctx. RPC registration is refactored from centralized register_all to four explicit inline iii.register_function calls. Configuration trigger registration is added.
Database handler adaptation for config-driven updates
database/src/configuration.rs
Handler registration binds an async closure that clones the III engine and invokes on_config_change. The internal handler refetches authoritative database configuration via fetch_config (ignoring trigger payload), logging fetch and rebuild failures separately while preserving existing pools on error.
Security rules, dependency updates, and documentation
iii-permissions.yaml, storage/Cargo.toml, storage/README.md
Permission rules deny storage::on-config-change and database::on-config-change to prevent agent calls to internal config reload handlers. iii-sdk dependency updated to =0.19.0. Documentation explains live configuration flow via the configuration worker, schema registration, hot-reload topology-change refusal, and local development seed setup.

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • iii-hq/workers#194: Both PRs refactor database/src/configuration.rs register_config_trigger and the internal config-change handler to rebuild pools from authoritative configuration instead of trigger payload, establishing the pattern that the main PR applies to storage.
  • iii-hq/workers#91: The main PR's storage refactor (AppState with RwLock, async backend(), removal of register_all in favor of explicit iii.register_function wiring, and config-trigger hookup) directly overlaps with PR #91's initial storage worker implementation of these handler/state/registration patterns.

Suggested reviewers

  • sergiofilhowz
  • ytallo

Poem

🐰 Configuration Dreams

Backends locked in RwLock's embrace,
Hot-reload changes—with topology grace,
No more payloads, just truth in the store,
Configuration speaks, and workers adore! 🔄✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: iii-sdk 0.19 migration, config-worker integration, and hardened reload mechanism.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch storage-sdk-config-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@storage/src/config.rs`:
- Around line 35-52: The derived Debug on NotificationKey leaks
CfQueue.api_token; replace the auto-derive with a manual impl of std::fmt::Debug
for NotificationKey that prints Sqs, Pubsub, and RustfsWebhook normally but
redacts the CfQueue.api_token (e.g. show account_id and queue_id but replace
api_token with "<redacted>" or similar). Keep the existing Clone/PartialEq/Eq
derives intact and ensure the Debug impl covers all NotificationKey variants to
avoid accidental token exposure in logs or test-assert messages.

In `@storage/src/configuration.rs`:
- Around line 128-137: The registered async handler for CONFIG_FN_ID is
acknowledging success unconditionally; change RegisterFunction::new_async's
closure to await on_config_change(&engine, &st, &boot_topology) and inspect its
Result, returning Ok(json!({ "ok": true })) only on success and returning an
Err(IIIError) (with context/log message) when fetch_config, reloadable, or
apply_config fail; apply the same fix to the other registration block referenced
(the analogous code around the second handler) so storage::on-config-change
failures propagate back to the caller instead of being silently acknowledged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 64e0e276-acf6-470a-8e11-921c992040c5

📥 Commits

Reviewing files that changed from the base of the PR and between 66c1701 and 8f70c33.

⛔ Files ignored due to path filters (1)
  • storage/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • database/src/configuration.rs
  • iii-permissions.yaml
  • storage/Cargo.toml
  • storage/README.md
  • storage/src/backend/factory.rs
  • storage/src/config.rs
  • storage/src/configuration.rs
  • storage/src/handlers/delete_object.rs
  • storage/src/handlers/get_object.rs
  • storage/src/handlers/mod.rs
  • storage/src/handlers/presign_url.rs
  • storage/src/handlers/put_object.rs
  • storage/src/lib.rs
  • storage/src/main.rs

Comment thread storage/src/config.rs
Comment on lines +35 to +52
/// Canonical identity of a bucket's notification source — exactly what the
/// boot-time pollers/webhook key on. Compared by value; never logged.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum NotificationKey {
Sqs {
queue_url: String,
region: String,
},
Pubsub {
subscription: String,
},
CfQueue {
account_id: String,
queue_id: String,
api_token: String,
},
RustfsWebhook,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact the CF Queue token from Debug.

NotificationKey now derives Debug while CfQueue carries api_token. Any {:?} on NotificationKey/BucketTopology/Topology or a failing equality assertion will print the raw token. The surrounding R2 config already redacts equivalent secrets, so this reintroduces a leak path.

🔐 Suggested fix
-#[derive(Debug, Clone, PartialEq, Eq)]
+#[derive(Clone, PartialEq, Eq)]
 pub enum NotificationKey {
     Sqs {
         queue_url: String,
         region: String,
@@
     RustfsWebhook,
 }
+
+impl std::fmt::Debug for NotificationKey {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            Self::Sqs { queue_url, region } => f
+                .debug_struct("Sqs")
+                .field("queue_url", queue_url)
+                .field("region", region)
+                .finish(),
+            Self::Pubsub { subscription } => f
+                .debug_struct("Pubsub")
+                .field("subscription", subscription)
+                .finish(),
+            Self::CfQueue {
+                account_id,
+                queue_id,
+                api_token,
+            } => f
+                .debug_struct("CfQueue")
+                .field("account_id", account_id)
+                .field("queue_id", queue_id)
+                .field("api_token", &redact_secret(api_token))
+                .finish(),
+            Self::RustfsWebhook => f.write_str("RustfsWebhook"),
+        }
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/src/config.rs` around lines 35 - 52, The derived Debug on
NotificationKey leaks CfQueue.api_token; replace the auto-derive with a manual
impl of std::fmt::Debug for NotificationKey that prints Sqs, Pubsub, and
RustfsWebhook normally but redacts the CfQueue.api_token (e.g. show account_id
and queue_id but replace api_token with "<redacted>" or similar). Keep the
existing Clone/PartialEq/Eq derives intact and ensure the Debug impl covers all
NotificationKey variants to avoid accidental token exposure in logs or
test-assert messages.

Comment on lines +128 to +137
iii.register_function(
CONFIG_FN_ID,
RegisterFunction::new_async(move |_payload: Value| {
let st = st.clone();
let engine = engine.clone();
let boot_topology = boot_topology.clone();
async move {
on_config_change(&engine, &st, &boot_topology).await;
Ok::<Value, IIIError>(json!({ "ok": true }))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't ack failed config reloads as successful.

The registered storage::on-config-change function always returns success, even when fetch_config, reloadable, or apply_config fail. That drops the configuration:updated event after logging and leaves the worker on stale backends until some later config change happens to retrigger reload.

Also applies to: 164-193

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/src/configuration.rs` around lines 128 - 137, The registered async
handler for CONFIG_FN_ID is acknowledging success unconditionally; change
RegisterFunction::new_async's closure to await on_config_change(&engine, &st,
&boot_topology) and inspect its Result, returning Ok(json!({ "ok": true })) only
on success and returning an Err(IIIError) (with context/log message) when
fetch_config, reloadable, or apply_config fail; apply the same fix to the other
registration block referenced (the analogous code around the second handler) so
storage::on-config-change failures propagate back to the caller instead of being
silently acknowledged.

@andersonleal andersonleal merged commit 96e6e94 into main Jun 9, 2026
18 checks passed
@andersonleal andersonleal deleted the storage-sdk-config-integration branch June 9, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants