Keep OpenClaw healthy without babysitting it.
fix-my-claw is a self-healing watchdog for long-running OpenClaw hosts. It probes Gateway health, runs official repair steps first, keeps a timestamped incident bundle for every attempt, and only escalates to AI when the standard recovery path fails.
The default AI path now uses acpx, so the tool can verify and use coding agents such as Codex and Claude as part of a guarded repair flow instead of leaving that logic in shell notes and operator memory.
Highlights • Install • Commands • Quick Start • How It Works • Probe Mode • Configuration • Systemd • Docs
flowchart LR
A["Probe gateway health + status"] --> B{"Healthy?"}
B -->|"Yes"| C["Sleep until next interval"]
B -->|"No"| D["Run official repair steps"]
D --> E{"Recovered?"}
E -->|"Yes"| C
E -->|"No"| F["Run guarded AI fallback"]
F --> G{"Recovered?"}
G -->|"Yes"| C
G -->|"No"| H["Write incident bundle + wait"]
- 🩺 OpenClaw-aware checks using
gateway healthandgateway status --require-rpc - 🛠️ Official-first recovery with configurable repair steps before AI escalation
- 🤖 AI fallback by default through
acpx, with automatic provider probing - 🔍 Real capability probing via
fix-my-claw probe, including auth-aware live dry-runs - 📦 Incident bundles under
~/.fix-my-claw/attempts/<timestamp>/ - 🧷 Operational guardrails for cooldowns, stale-lock cleanup, single-instance execution, and remote-mode safety
- 🖥️ Service-friendly deployment with bundled
systemdunits and timer files
If you already recover OpenClaw manually with commands like:
openclaw doctor --repair --non-interactiveopenclaw gateway restart
then fix-my-claw turns that runbook into a repeatable control loop:
- it checks whether OpenClaw is healthy
- it runs the same official steps you would run manually
- it records what happened
- it only reaches for AI when the standard path fails
| Command | What it does | Typical use |
|---|---|---|
fix-my-claw up |
Initialize default config if missing, then start monitoring | One-command startup |
fix-my-claw monitor |
Run the long-lived monitor loop | systemd / background service |
fix-my-claw check |
Probe health once | Health check / scripting |
fix-my-claw probe |
Validate repair paths, commands, paths, and AI capability | Preflight / debugging |
fix-my-claw repair |
Run one repair attempt now | Manual intervention |
fix-my-claw init |
Write the default config | First-time setup |
The simplest install path is directly from GitHub:
python3 -m venv .venv
source .venv/bin/activate
pip install git+https://github.com/caopulan/fix-my-claw.gitIf you already have the repository checked out:
pip install .- Python 3.9+
- OpenClaw installed and callable as
openclaw - Access to the OpenClaw workspace and state directories on the target host
acpxinstalled if you want the default AI backend to be usable immediately- Best deployed on the Gateway host itself
If openclaw is not on PATH, set [openclaw].command to an absolute path.
Start the watchdog with the default config:
fix-my-claw upThat command creates ~/.fix-my-claw/config.toml if needed, then enters the monitor loop.
Useful one-shot commands:
# Create the default config and print its path
fix-my-claw init
# Probe OpenClaw health once
fix-my-claw check --json
# Dry-run all configured repair paths, including AI backends
fix-my-claw probe --json
# Skip live AI calls and only validate static prerequisites / argv / syntax
fix-my-claw probe --no-live-ai --json
# Force one repair attempt now
fix-my-claw repair --force --jsonDefault paths:
- Config:
~/.fix-my-claw/config.toml - Log file:
~/.fix-my-claw/fix-my-claw.log - Incident bundles:
~/.fix-my-claw/attempts/<timestamp>/
Example console output:
00:05:52 | START | mode=up config=/Users/me/.fix-my-claw/config.toml
00:05:52 | WATCH | watching every 60s log=/Users/me/.fix-my-claw/fix-my-claw.log
00:06:06 | PROBE | status probe failed: rpc unavailable
00:06:08 | REPAIR | official 1/2 run=openclaw doctor --repair --non-interactive
00:06:32 | AI | config stage backend=acpx providers=codex:ok, claude:ok
Default execution flow:
- Probe OpenClaw with:
openclaw gateway health --jsonopenclaw gateway status --json --require-rpc
- If OpenClaw is healthy, do nothing and wait for the next interval.
- If OpenClaw is unhealthy, run the official repair steps:
openclaw doctor --repair --non-interactiveopenclaw gateway restart
- Re-check health after each official step.
- If OpenClaw is still unhealthy, enter AI fallback.
- Save logs, probe outputs, command outputs, and repair metadata to an incident bundle.
Default AI behavior:
ai.enabled = trueai.backend = "acpx"ai.provider = "auto"- automatic order:
codex, thenclaude ai.allow_code_changes = false
That means the default setup first tries guarded config/state remediation through acpx, not broad code changes.
The default OpenClaw command set is:
- Health probe:
openclaw gateway health --json - Status probe:
openclaw gateway status --json --require-rpc - Log capture:
openclaw logs --tail 200 - Official repair:
openclaw doctor --repair --non-interactiveopenclaw gateway restart
All of these can be overridden in the config.
There are 2 AI backends:
backend = "acpx": default unified path for coding agentsbackend = "direct": native integrations such ascodex execandopenclaw agent
When backend = "acpx" and provider = "auto":
fix-my-clawprobesacpx- then checks whether
codexandclaudeare actually callable - then tries the first usable provider
- then falls through to the next usable provider if health is not restored
When backend = "direct" and provider = "auto":
- the order is
codex, thenopenclaw openclawis checked withopenclaw models status --check --jsonprovider = "openclaw"can useopenclaw agent --localto bypass the Gateway
Important boundary:
acpx openclawis supported when explicitly selected- it is not part of the default
autoorder because it depends on the Gateway-backedopenclaw acppath
fix-my-claw probe is the preflight command for this project.
It does more than check:
- validates
gateway.modesafety - checks whether configured directories exist or can be created
- dry-runs official repair commands with
--help - validates whether configured argv references real paths
- checks AI backends/provider availability
- can perform live AI dry-runs to verify auth and runtime usability instead of only checking whether the binary exists
Example:
fix-my-claw probe --jsonIf you want a lighter preflight:
fix-my-claw probe --no-live-ai --jsonTypical human-readable output looks like:
probe summary: 8 ok, 2 warn, 3 fail, 0 skip
[OK ] config.gateway_mode: gateway.mode=local
[OK ] repair.official.1: dry-run syntax check passed
[FAIL] ai.acpx.codex: static probe failed: acpx-command-unavailable
This is useful for answering questions like:
- “Is Codex installed?”
- “Is auth really usable?”
- “Will my configured argv fail because a path is wrong?”
- “Can this host actually use the default AI backend?”
All runtime settings live in one TOML file. Generate it with fix-my-claw init, or start from examples/fix-my-claw.toml.
Key settings:
| Setting | What it controls |
|---|---|
[monitor].interval_seconds |
How often the watchdog probes OpenClaw |
[monitor].repair_cooldown_seconds |
Minimum delay between repair attempts |
[openclaw].command |
Absolute path to openclaw when PATH differs under systemd |
[openclaw].allow_remote_mode |
Allow running even when gateway.mode=remote |
[repair].official_steps |
Ordered repair commands to run before AI escalation |
[ai].enabled |
Whether AI remediation is allowed |
[ai].backend |
acpx or direct |
[ai].provider |
auto, codex, claude, or openclaw |
[ai].local |
Use openclaw agent --local when provider = "openclaw" |
[ai].acpx_permissions |
Permission mode for unattended acpx runs |
[ai].allow_code_changes |
Enable second-stage code-changing AI remediation |
Safety defaults:
gateway.mode=remoteis blocked by default- AI runs are rate-limited by cooldown and per-day limits
- all state lives under
~/.fix-my-claw
Example AI config:
[ai]
enabled = true
backend = "acpx"
provider = "auto"
acpx_command = "acpx"
acpx_permissions = "approve-all"
acpx_non_interactive_permissions = "fail"
acpx_format = "json"
timeout_seconds = 1800
allow_code_changes = falseLinux deployment files live in deploy/systemd:
fix-my-claw.service: long-running monitor loopfix-my-claw-oneshot.service+fix-my-claw.timer: periodic repair attempts
Example:
sudo mkdir -p /etc/fix-my-claw
sudo cp examples/fix-my-claw.toml /etc/fix-my-claw/config.toml
sudo cp deploy/systemd/fix-my-claw.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now fix-my-claw.serviceNotes:
- If you installed inside a virtualenv, replace
ExecStartwith the absolute path to that virtualenv'sfix-my-clawbinary. - If
openclawis not visible inside thesystemdenvironment, set[openclaw].commandto an absolute path.
fix-my-clawautomates recovery; it does not replace fixing root causesacpxis a strong default interface for Codex/Claude-style remediation, but it is still alphaacpx openclawdepends on the Gateway, so it is not the default AI fallback path for a dead Gateway- using OpenClaw-registered models during a Gateway outage requires a local or embedded path such as
openclaw agent --local - if you only need periodic checks, the timer-based deployment may be a better fit than a full monitor loop
- Example config
- systemd deployment files
- Changelog
- Contributing guide
- Code of Conduct
- Security policy
- Issue tracker
Contributions are welcome. Read CONTRIBUTING.md before opening a pull request.
Bug reports are much easier to triage when they include:
- OS and Python version
- OpenClaw version
- relevant
fix-my-clawconfig with secrets redacted - recent
~/.fix-my-claw/fix-my-claw.log - the latest attempt directory under
~/.fix-my-claw/attempts/
MIT © fix-my-claw contributors