No retry/reconnect on controller connection failure

Noticed a race condition if operator starts first. Need retry logic for eventual consistency:

The node watcher and job controller in main.rs are spawned as tasks that exit permanently on the first transport error. If spurctld isn't ready when the operator starts, those tasks die and never come back. Requires wrapping them in retry loops.