Skip to content

fix(iot): guard CouchDB construction and add first-run DX scaffolding#322

Open
iksnerd wants to merge 1 commit into
IBM:mainfrom
iksnerd:fix/iot-startup-and-first-run-dx
Open

fix(iot): guard CouchDB construction and add first-run DX scaffolding#322
iksnerd wants to merge 1 commit into
IBM:mainfrom
iksnerd:fix/iot-startup-and-first-run-dx

Conversation

@iksnerd
Copy link
Copy Markdown

@iksnerd iksnerd commented May 24, 2026

Description

The iot MCP server crashed on startup when COUCHDB_URL was unset: couchdb3.Database(None, url=None, ...) raised inside __init__, leaving a partially-constructed instance whose __del__ then double-faulted with AttributeError: 'Database' object has no attribute 'session'.

This PR guards the construction so the server stays startable for tool discovery even when CouchDB env vars are missing — tool calls that need the DB now return a clean error instead. It also bundles three small first-run DX improvements that surfaced while investigating.

Fix Details

  • src/servers/iot/main.py — initialize db = None before the construction attempt and guard on COUCHDB_URL and COUCHDB_DBNAME being set. Skipping construction avoids the partially-constructed-Database trap entirely; a WARNING is logged so the failure mode is visible. Also removed an unused from functools import lru_cache flagged by ruff.
  • .env.public — added VIBRATION_DBNAME=vibration. Documented in docs/mcp-servers.md:93 but missing from the example env file, causing the vibration server to start without its DB name.
  • .mcp.json.example — new file at repo root with stdio entries for the six *-mcp-server console scripts. Any MCP client (e.g. MCP Inspector) can cp .mcp.json.example .mcp.json and have all servers wired up. .mcp.json is gitignored so user-local copies stay out of git.
  • docs/mcp-servers.md — added an "Inspecting a server directly" section covering the MCP Inspector UI, the .mcp.json.example template, and a raw stdio JSON-RPC handshake recipe. Also extended the wo "Data init" line to note that an *_not_available error means the CouchDB container's seeding step didn't run.

Impact on Benchmarking

  • No change to baselines: This fix only improves stability and first-run DX. No tool behavior changes; the iot server's tools return identical responses when CouchDB is reachable. The only observable difference is when env vars are unset, where the server now emits a clean WARNING instead of an ERROR followed by an AttributeError traceback.

Related Issues

  • Fixes: (no existing issue — surfaced during a hands-on review of the MCP server cold-start path)

Verification Steps

  1. Unit testsuv run pytest src/ -k "not integration"311 passed, 3 skipped, 50 deselected.
  2. End-to-end cold-start probe with no env vars (the original bug scenario):
    env -i HOME=$HOME PATH=$PATH .venv/bin/iot-mcp-server  # stdio + JSON-RPC initialize + tools/list
    • Before this PR: stderr emits ERROR:iot-mcp-server:Failed to connect to CouchDB: 'NoneType' object has no attribute 'startswith' plus AttributeError: 'Database' object has no attribute 'session' on shutdown.
    • After this PR: stderr emits a single WARNING:iot-mcp-server:CouchDB env vars not set (COUCHDB_URL, IOT_DBNAME); tool calls that need CouchDB will return an error. No traceback. tools/list returns the same four tools.
  3. Ruffuvx ruff format --check src/servers/iot/main.py → clean; uvx ruff check src/servers/iot/main.pyAll checks passed!

Pre-existing test failures observed (not caused by this PR)

While running the full unit suite, src/observability/tests/test_file_exporter.py::test_file_exporter_writes_jsonl and ::test_file_exporter_appends fail with ModuleNotFoundError: No module named 'google.protobuf'. The base install does not include protobuf — it's only pulled in via uv sync --group otel. These tests likely want a skipif guard (similar to the integration suites in CouchDB / WATSONX). Happy to file a separate issue if useful.

Checklist

  • I have added tests that prove my fix is effective. (The existing test_db_disconnected cases in src/servers/iot/tests/test_tools.py exercise the db is None path; the new guard makes that path the only outcome when env vars are unset rather than a side-effect of an exception.)
  • My code follows the project's Ruff formatting and linting rules.
  • I have signed off my commits (DCO).

@iksnerd
Copy link
Copy Markdown
Author

iksnerd commented May 24, 2026

End-to-end verification with CouchDB running

After opening this PR I noticed the test plan only covered the cold-start (no env) path. Re-ran with docker compose -f src/couchdb/docker-compose.yaml up -d so all three DBs (iot, workorder, vibration) were seeded, then exercised the MCP tools through a connected client.

Tool call Result
iot.assets("MAIN") ✅ 3 assets — Chiller 6, hyd_1, mp_1 (matches docs/mcp-servers.md:23-25)
iot.sensors("MAIN", "Chiller 6") ✅ 11 sensors (Tonnage, Supply/Setpoint/Return Temperature, Condenser Water Flow, Power Input, …)
iot.history("Chiller 6", "2020-04-27T00:00:00", "2020-04-27T01:00:00") ✅ returns total_observations: 0 cleanly — the seed data's timestamps don't cover that window, but the CouchDB query and result-shape are correct
wo.get_failure_codes() ✅ 143 failure codes across 9 categories (Structural/Mechanical, Corrosion, Leaks, Electrical, Control System, …)

The iot connected path is unchanged by the guard — the existing try block is preserved verbatim under if COUCHDB_URL and COUCHDB_DBNAME:. The new behaviour (clean WARNING when env vars are unset, rather than ERROR + __del__ AttributeError) was verified earlier via env -i against .venv/bin/iot-mcp-server. Both paths covered.

Side observation while testing

The bundled .env.public fix for VIBRATION_DBNAME is load-bearing: a vibration server launched without it can't find the seeded vibration DB (4097 docs for Motor_01 / Vibration_X are present in CouchDB, but list_vibration_sensors returns "No sensors found" because the server is pointed at the wrong DB name). Adding VIBRATION_DBNAME=vibration to .env.public resolves this on next start.

@iksnerd iksnerd force-pushed the fix/iot-startup-and-first-run-dx branch from 8308f92 to a6c7f1e Compare May 24, 2026 02:17
@DhavalRepo18 DhavalRepo18 requested a review from ShuxinLin May 24, 2026 03:54
The iot MCP server crashed on startup when COUCHDB_URL was unset:
`couchdb3.Database(None, url=None, ...)` raised inside __init__, and the
partially-constructed object's __del__ then double-faulted with
`AttributeError: 'Database' object has no attribute 'session'`.

Guard the construction so the server stays startable for tool discovery
even when CouchDB env vars are missing — tool calls that need the DB
return a clean error instead.

Also bundle three small first-run DX improvements that surfaced while
investigating:

- Add `VIBRATION_DBNAME=vibration` to `.env.public` (already documented
  in `docs/mcp-servers.md` but missing from the example env file).
- Ship `.mcp.json.example` at repo root so Claude Code / MCP Inspector
  users have a copy-paste config for the six stdio servers. Gitignore
  `.mcp.json` so user-local copies stay out of git.
- Document an "Inspecting a server directly" recipe in
  `docs/mcp-servers.md` (Inspector UI, `.mcp.json` template, raw stdio
  JSON-RPC) and note that `wo` returns `*_not_available` until the
  CouchDB container's seeding step has run.

Signed-off-by: iksnerd <bdrensk@me.com>
@iksnerd iksnerd force-pushed the fix/iot-startup-and-first-run-dx branch from a6c7f1e to 34e25aa Compare May 24, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant