Skip to content

Start from cached capabilities when the device is offline at setup#439

Open
st03psn wants to merge 3 commits into
mill1000:mainfrom
st03psn:fix/offline-tolerant-startup
Open

Start from cached capabilities when the device is offline at setup#439
st03psn wants to merge 3 commits into
mill1000:mainfrom
st03psn:fix/offline-tolerant-startup

Conversation

@st03psn

@st03psn st03psn commented Jul 2, 2026

Copy link
Copy Markdown

Summary

Add an opt-in "Allow offline startup" option so a device that is intentionally powered off (e.g. a seasonal/portable PortaSplit) no longer surfaces the "Failed to authenticate with device." needs-attention banner on an HA restart. When enabled, the entities start unavailable and reconnect in the background — the normal "device offline" UX. The option defaults to off, so existing installs keep the current ConfigEntryNotReady behavior unchanged.

Fixes #438.

Problem

async_setup_entry raises ConfigEntryNotReady whenever the device is unreachable during setup (in authenticate(), get_capabilities(), or the coordinator's first refresh). A disconnect after setup is already handled via device.online; the banner only appears when setup re-runs while the device is off, because capabilities are queried live and there is nothing to build entities from.

Approach

  • New option allow_offline_startup (boolean, default off) in the options flow. When off, setup behaves exactly as before. All of the behavior below is gated on it.
  • Cache capabilities in config_entry.data (cached_capabilities) on the first successful get_capabilities() — done unconditionally (cheap, and makes the option work immediately once enabled). Enum values are converted to their names so the cache is JSON-serializable; written only when changed to avoid triggering the update-listener reload loop.
  • Start from cache when offline (option enabled): if the device is unreachable at a later (re)setup and a cache exists, log a warning, rebuild capabilities via override_capabilities(cached, merge=False), and use coordinator.async_refresh() (which does not raise) instead of async_config_entry_first_refresh(). Entities start unavailable because device.online is False.
  • Auto-reconnect for V3 devices: when the entry starts from cache while offline, authenticate() failed before msmart stored the token/key on the LAN connection, so the coordinator's refresh() (which lazily calls authenticate() with no credentials) could never re-establish the session — entities would stay unavailable until a manual reload. The coordinator now receives the token/key and re-authenticates in _async_update_data while the device has no stored credentials, so polling recovers automatically once the device is reachable again. The branch is skipped once authenticated (device.token is set), leaving the online path unchanged.
  • A cold start with no cache still raises ConfigEntryNotReady — entities cannot be built without capabilities.

Behavior (with the option enabled)

Scenario Before After
Online setup entry loaded entry loaded; caps cached once
Offline (re)setup, cache present setup_retry + banner entry loaded, entities unavailable
Device returns after offline start (n/a — banner/retry) entities auto-recover on next poll, no reload
Offline setup, no cache setup_retry + banner unchanged (ConfigEntryNotReady)
Option disabled (default) unchanged (ConfigEntryNotReady)

Testing

Verified end-to-end on a live HA OS instance (core-2026.6.4, msmart-ng 2026.7.0) with a real V3 PortaSplit, option enabled:

  1. Device online + HA restart → entry loaded; config_entry.data.cached_capabilities written once, with enum names (COOL, FAN_ONLY, CUSTOM_FAN_SPEED, …) — JSON-serializable, matching the live device caps.
  2. Device powered off + entry reload → entry stays loaded, climate.* and sensors unavailable, no banner, log shows
    Could not authenticate with device ID …, starting from cached capabilities: Connect failed.
  3. Device powered back on (no reload) → the coordinator re-authenticates on the next poll and entities return to available on their own within one update interval (confirmed via the logbook unavailable → off transition with no intervening setup).
  4. No-cache cold start with device off → still ConfigEntryNotReady (correct fallback).

Also verified against msmart-ng==2026.7.0 that capabilities_dict() → name conversion → override_capabilities(..., merge=False) round-trips cleanly (the raw enum dict is rejected by override_capabilities, which is why the name conversion is needed), and that a failed authenticate() leaves device.token/device.key unset (the reason the coordinator must re-authenticate).

st03psn and others added 3 commits July 2, 2026 16:31
Previously async_setup_entry raised ConfigEntryNotReady whenever the
device could not be reached during setup (authenticate/get_capabilities/
first refresh). For a device that is intentionally powered off part of
the time (e.g. a seasonal/portable split), a normal HA restart while the
device is off surfaces the "needs attention / Failed to authenticate"
banner even though the device is merely offline.

Cache the device capabilities in config_entry.data on the first
successful get_capabilities(). If the device is unreachable at a later
(re)setup and a cache exists, build the entities from the cached
capabilities and start with entities unavailable (device.online is
False), reconnecting in the background, instead of raising
ConfigEntryNotReady. A cold start with no cache still raises
ConfigEntryNotReady, since entities cannot be built without capabilities.

Enum values in the capabilities dict are converted to their names so the
cache is JSON-serializable for config_entry.data. The cache is only
written when changed to avoid triggering the update listener reload loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n offline start

When the entry starts from cached capabilities while the device is
offline, authenticate() fails before msmart stores the token/key on the
LAN connection. The coordinator's refresh() then calls the device's
lazy authenticate() with no credentials, so a V3 device can never
re-establish its session on its own once it comes back — the entities
stay unavailable until a manual reload re-runs async_setup_entry.

Pass the token/key to the coordinator and (re)authenticate in
_async_update_data when the device has no stored credentials, so polling
recovers automatically once the device is reachable again. The branch is
skipped once authenticated (device.token is set), so the online path is
unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a boolean option (default off) to the options flow so the
offline-tolerant startup is opt-in. When disabled, setup keeps the
standard ConfigEntryNotReady behavior; when enabled, an unreachable
device with cached capabilities starts unavailable instead of showing a
setup error. Capabilities are still cached unconditionally so the option
works immediately once enabled.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Device merely being offline at setup shows a "Failed to authenticate" / needs-attention banner

1 participant