openclaw backup create, Acceptance Ladder, and ACP / gateway probe Regression Triage RunbookIf you are about to run—or just finished—OpenClaw openclaw update or an image upgrade and now see Control UI open while gateway probe times out, or ACP / CLI device streams regress on 2026.3.13+, this article answers: how to snapshot with openclaw backup create before the window; how the status → gateway status → gateway probe → doctor acceptance ladder decides go-live vs rollback; and a symptom-based runbook for probe failures, WebSocket 1006, and ACP “queue owner unavailable.” It complements the version migration checklist and bad-release digest rollback—this page owns backup plus probe/ACP acceptance.
backup create: rollback becomes guesswork—you cannot prove pairing and channel state from the last known-good combination.acpx on the host may still work—run ACP triage here before swapping models.Upstream and community docs in 2026 increasingly define “upgrade” as a reversible state migration, not a one-shot npm install -g. openclaw backup create archives the current ~/.openclaw tree (or the Docker volume equivalent) into a named snapshot so that when probe fails repeatedly or ACP registration drops, you can restore the pre-upgrade combination in minutes. That pairs with the release-channel pin matrix: one side locks binaries (tag/digest), the other locks runtime config and pairing—the same FinOps mindset applied to different failure surfaces.
Teams that skip backup often discover the painful pattern on the second incident: the first rollback “worked” because someone still had an old compose file in shell history, but the third upgrade has no ticket, no archive path, and no recorded Node plus OpenClaw fingerprint. Writing backup create into the change template is cheaper than explaining to security why production tokens were re-paired under stress at 2 a.m.
| Existing long-form on site | This article covers | Intentionally not duplicated |
|---|---|---|
| Version migration checklist | Pre-upgrade backup create + post-upgrade probe ladder |
Full directory moves, multi-host Gateway cutover |
| Bad-release digest rollback | When to trigger rollback after probe failure | Step-by-step compose pull / digest lock commands |
| tools.profile triage | Minimal tool probe step inside the ladder | Allowlist three-layer deep dive |
| Gateway no-reply | Exclude total silence before probe work | Channel OAuth, model routing |
openclaw backup create and directory boundary checklistAt the start of the change window, run a fixed sequence: backup → record version fingerprints → confirm a single authoritative Gateway. Exact subcommand names can vary slightly by release channel; always verify with openclaw backup --help. The principle does not change: you need a restorable local archive before you mutate production.
openclaw --version
node -v # target: v24.x; align Node before bumping OpenClaw
openclaw backup create
# optional: list existing backups
ls -la ~/.openclaw/backup 2>/dev/null || ls -la "${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/backup"
# freeze the known-good combination (paste into change ticket)
openclaw gateway status
openclaw config get gateway.auth.token 2>/dev/null | head -c 8; echo "…(redacted)"
| Check | Local npm | Docker Compose | Remote Mac dedicated host |
|---|---|---|---|
| State directory | ~/.openclaw not inside iCloud/sync folders |
bind mount to a fixed host path | OPENCLAW_STATE_DIR on dedicated disk, ticket-visible |
| Backup sensitivity | Usually includes tokens/pairing; store as confidential; evaluate rotation before restore | ||
| Dual Gateway | launchd plus manual on same port | compose and host both on 18789 | laptop forward plus remote both running |
| Disk headroom | Before backup: df -h free space ≥ 2× state dir size (avoid half-written archives) |
||
Note: a manual tar ~/.openclaw without the official backup command may miss versioned metadata or incremental indexes. For production windows, prefer backup create; manual tar is a second cold copy only.
After upgrade, do not close the ticket because Control UI loads or chat returns “hello.” Use a fixed ladder; stop on first failure and capture stderr plus versions:
openclaw status — CLI reads configopenclaw gateway status — process/port/bind summaryopenclaw gateway probe (or --json) — loopback handshake and latencyopenclaw doctor — config and dependency warningschannels status --probe on channels you actually useGo-live means steps 1–4 pass in one run and step 5 passes on your real channel/tool surface. Must rollback means the same step still fails after reload/restart for two consecutive rounds and production Agents are impacted—restore from backup or follow digest rollback to the tag/digest on the ticket, instead of stacking config patches on a bad build.
Probe is deliberately loopback-oriented: it can fail while a browser dashboard on another path still renders, because UI static assets and WebSocket control plane do not share identical timeouts. That is why ladder ordering matters—gateway status green plus probe red is a documented 2026 pattern, not an operator hallucination.
openclaw status openclaw gateway status openclaw gateway probe openclaw doctor # Docker path: after upgrade, reload the same compose project # docker compose pull && docker compose up -d # docker compose restart <gateway-service> openclaw channels status --probe
| Symptom | Suspect first | First action |
|---|---|---|
| Probe timeout, gateway status still healthy | Provider plugin slowing startup; loopback race | Disable failing provider extension temporarily; wait before probe; on Windows compare rolling back one patch per community reports |
| WebSocket 1006 closed before connect | Token/bind/reverse-proxy Upgrade headers | Follow pairing and 1006 runbook; rule out proxy on localhost first |
| ACP “queue owner unavailable” | ACP bridge registration regression (2026.3.x) | Confirm host acpx; check version issues; pin or rollback minor—do not swap model first |
openclaw devices list times out |
CLI device stream vs Gateway version skew | Align CLI/Gateway versions; restore backup then single-step upgrade if needed |
| Channel totally silent | Channel/model layer | Jump to no-reply guide; pause this runbook |
On-call often oscillates between “one more config tweak” and “rollback now.” Use the table to decide quickly (rows = blast radius, columns = recommended action):
| Blast radius | Keep fixing (config/plugins) | Pin / rollback | Temporarily disable ACP or failing provider |
|---|---|---|---|
| Probe red only, channels healthy | Log as monitor noise; fix startup latency | If SLA mandates green probe, rollback patch | Disable provider extension that slows boot |
| ACP fully down, chat OK | Inspect bridge registration and plugin discovery | Rollback minor inside known regression window | Disable ACP temporarily to protect channel SLA |
| Probe + channels + tools all down | Only after backup restore, single-step retry | Prefer backup restore or digest rollback | Not first choice |
backup create + directory checklist: confirm archive size sane; state dir not on sync volume.compose pull/up; one channel step per ticket (do not jump beta→stable across two channels in one window).On multi-region remote Mac hosts, schedule upgrade windows alongside stability acceptance and disk checks. Peak-hour image pull plus full probe sweeps often mis-label network jitter as “ACP broken.” Safer pattern: upgrade and accept on an always-on, dedicated, ticketed node; laptops only SSH-forward to Control UI.
Asking chat “did upgrade work?” or tweaking two YAML keys without a ladder is not auditable and cannot be replayed on a second machine. By contrast, baking backup create, the acceptance ladder, and ACP/probe triage into a runbook turns a bad release from an evening of blind retries into a backed-up, rollback-pointed, metric-backed ten-minute incident.
If you still chase channels on a personal laptop, budget three hidden costs: sleep-induced Gateway stalls, probe paths that disagree with business traffic, and upgrade windows fighting local power policy. For 7×24 OpenClaw production Gateways with stable Node 24 baseline and ticketed change control, hosting on MACCOME Mac mini (M4 / M4 Pro) with flexible multi-region leases usually beats fighting probe timeouts on lid-closed hardware. Review multi-region node and lease guide, then wire topology with the SSH dedicated Gateway runbook.
FAQ
Does backup create before upgrade include tokens?
It usually includes auth and pairing material from the state tree; treat archives as confidential and evaluate rotation before restore. For a production-dedicated host see Mac mini rental rates.
gateway probe fails but the dashboard opens—must I roll back?
Not necessarily. Triage probe timeout, 1006, and ACP registration using the symptom table; roll back only when ladder steps 1–5 fail two consecutive rounds and production is hurt—then use digest rollback.
What should I watch on remote Mac upgrade windows?
Avoid build peaks and tight disk; run backup on a dedicated state dir; execute probe acceptance on the remote host while the laptop only forwards. More access issues: cloud Mac support help and rental rates.