Teams running OpenClaw Gateway on Docker or Kubernetes in 2026 often ship fast yet still treat “container running” as healthy. Without HTTP probe paths, readiness semantics, and rolling parameters on the same change ticket, you get liveness kills during cold start, depends_on that waits for containers but not readiness, or provider 429 mistaken for a dead Gateway and endless restarts. This article scopes against the Docker production runbook, upgrade and migration checklist, and provider routing and failover, and delivers six RCA-ready pitfalls, a liveness/readiness/startup matrix, Compose versus Kubernetes mapping, copy-paste YAML snippets, a six-step rollout runbook, and three dashboard metrics—plus how to place Gateway on a stable remote Mac execution plane.
Recent OpenClaw releases add orchestration-friendly HTTP endpoints (exact paths and ports follow your pinned image tag and release notes; names such as /health, /ready, /healthz appear in the ecosystem). Log these six patterns in RCAs and reuse vocabulary from doctor and post-install triage.
depends_on without health conditions: dependents start while Gateway still cannot reach a backend socket—intermittent 502.127.0.0.1 works inside the pod but ClusterIP fails—misread as app failure.maxUnavailable: old pods drain before new pods pass readiness—short full-red windows.Compared with the cross-platform install guide: install answers first boot; production answers long-lived ops; this article answers how orchestrators decide healthy; upgrades answer image moves and rollback.
Kubernetes probe types do not map 1:1 to Docker healthcheck restart semantics; use the table in architecture reviews.
| Check | Typical failure effect | Validates | OpenClaw-oriented guidance |
|---|---|---|---|
| startupProbe | Suppresses liveness failures until success | Slow but bounded cold start | Use when first config fetch, indexes, or dependencies take minutes |
| livenessProbe | Restart container/Pod | Deadlocks, unresponsive process | Avoid external LLM dependencies; minimal self-check only |
| readinessProbe | Remove from Service endpoints | Not ready for traffic | May include minimal model ping or config-loaded signal—align with failover policy |
| Docker healthcheck | Marks unhealthy; restart policy varies | Single-host Compose | Pair with depends_on: condition: service_healthy (syntax per Compose v2 docs) |
Translating “healthy” into concrete fields cuts midnight debate.
| Dimension | Docker Compose (pattern) | Kubernetes Deployment |
|---|---|---|
| Probe command | healthcheck.test with curl/wget | httpGet or exec |
| Startup grace | start_period | startupProbe or larger initialDelaySeconds |
| Traffic shedding | Proxy/LB layer or health label only | readinessProbe controls Endpoints |
| Rolling | Manual compose order or external CD | maxSurge / maxUnavailable / minReadySeconds |
# Examples—replace PORT and paths with values from docs for your tag
# Docker Compose (excerpt)
healthcheck:
test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:${GATEWAY_PORT}/health || exit 1"]
interval: 15s
timeout: 3s
retries: 5
start_period: 120s
# Kubernetes (excerpt)
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 20
Warning: Upstream may add or rename /health, /ready, /healthz across 2026.3.x-style releases. Before copying snippets, confirm official docs for your digest/tag and verify with curl -v in staging.
127.0.0.1 via docker compose exec or kubectl exec, then validate via Service.maxUnavailable versus the maintenance window.On the Linux systemd + Tunnel path, align tunnel health, loopback listeners, and upstream LB checks—otherwise you can see “tunnel alive, Gateway not listening” false positives.
Correlate kubectl rollout status or compose upgrade logs with Git changes to separate tight probes from image regressions.
Consumer gear fights sleep, disk jitter, and unscheduled OS updates—startup time and probe thresholds drift. Combined with rolling windows, that burns on-call hours. Running OpenClaw and agents under an expected SLA needs dedicated compute, stable egress, and burst-friendly nodes.
Fragmented self-hosting also makes multi-region latency and contract ops harder: probe tuning plus host reboot coupling is painful on laptops. For 24/7 observable, rollable, rollback-friendly Gateways, professional multi-region Apple Silicon cloud Macs usually beat ad-hoc hardware. MACCOME offers Mac Mini M4 / M4 Pro bare-metal with flexible terms as a Gateway or mixed automation host—start with the help center for access language, then rental rates and the multi-region guide to finalize SKUs.
Pilot: short-term rent in your target region, run container probes, Service probes, and one full rolling exercise before locking monthly or quarterly terms.
FAQ
Probes fail but the UI opens—which source wins?
Orchestrator-configured URLs and status codes. For billing context open rental rates; for probes reproduce with in-container curl in staging.
How do I use this with the Docker production article?
Production covers volumes and tokens; this covers probes and rollouts. Attach both plus upgrades to the same change.
Should 429 hit liveness?
Generally no—use provider routing and failover for backoff and routing; readiness coupling is an explicit SLO choice.