OpenClaw keeps returning 502 behind Nginx—should I check the reverse proxy or Docker networking first?

On the proxy machine, curl the loopback upstream (for example 127.0.0.1:18789) to separate an unreachable upstream process from edge-only TLS or path mistakes. If loopback works but the public hostname 502s, prioritize proxy_pass path, WebSocket Upgrade headers, and timeouts; if loopback fails, return to Docker networking and bind triage.

Subpath hosting vs subdomain hosting—which is less painful?

Subdomains usually avoid path stripping and Cookie path issues; subpaths suit a single branded entry hostname but require aligned prefix stripping on both Gateway and the proxy, and you must confirm WebSocket still hits the same location.

WebSocket drops after I put a CDN in front—what should I do?

Check whether the CDN disables WebSocket on / or uses a short idle timeout; disable caching on control-plane paths, raise proxy_read_timeout, or bypass the CDN on a dedicated origin subdomain.

2026 OpenClaw Production Reverse Proxy & TLS: WebSocket, Subpath vs Subdomain, and 502/Handshake Triage Behind Nginx/Caddy

~16 min read · MACCOME

Gateway looks healthy in the container, but behind Nginx/Caddy you see 502s, failed WebSocket handshakes, or endless redirects? This article focuses on the edge reverse proxy + TLS termination + WebSocket upgrade layer. It complements Docker network triage: that piece covers which packets can see which peers; this one covers whether the browser-to-Gateway HTTP/1.1 upgrade chain and paths are correct. You get six common pitfalls, two topology/symptom tables, copy-paste snippets, a six-step production runbook, and three on-call metrics. After reading, you can tell whether to fix Upgrade headers or go back to the upstream 127.0.0.1:18789.

Six reverse-proxy pitfalls (why curl can reach upstream yet you still get 502)

proxy_pass without HTTP/1.1 and Upgrade: browsers using WebSocket get 400/502 at the edge while logs look like the Gateway died.
Double-prefix subpaths: the proxy strips /openclaw but the app still emits absolute URLs from /, splitting static assets and WS paths.
Mixing HTTP/2 listeners with WS without checking ALPN: some stacks need a dedicated server for WS or an explicit HTTP/1.1 fallback, otherwise handshakes fail intermittently.
proxy_read_timeout too short: long reasoning bursts or channel traffic get silently cut at the edge, triggering reconnect storms that hammer upstream.
Incomplete certificate chains fixed only for browser warnings: stricter automation clients may fail while desktop CLIs look fine.
CDN default caching or short idle timeouts: control-plane WebSocket traffic is cached like a plain GET or dropped early, feeling like random disconnects.

The topology table below flattens same-host proxy, separate proxy host, subdomain, and subpath trade-offs before we move to header checklists and the symptom matrix.

Topology choices: same host, separate host, subpath, and subdomain—how they change triage

A same-host reverse proxy (Nginx/Caddy colocated with Gateway) keeps triage shortest: curl -v http://127.0.0.1:18789 validates upstream from the box. A separate proxy adds hops—recheck security groups, internal DNS, and TLS SNI. Subdomains usually dodge path stripping and Cookie path fights; subpaths suit a single branded hostname but you must strip the prefix only once on both OpenClaw base URL and the proxy, then confirm in devtools that WS URLs still land on the expected wss:// path.

In change requests, attach two screenshots: the final browser bar URL, and the Network tab WebSocket handshake Request URL. Without them, reviews often misfile the issue as model timeouts or channel OAuth and burn release time on the wrong plane. If you also run a corporate HTTPS proxy, separate office-browser paths (PAC injection) from datacenter proxy paths (often direct)—mismatched symptoms are normal, not exotic.

HTTP/2 at the edge with HTTP/1.1 WebSocket upstreams remains common in 2026: confirm which hop allows the upgrade. Do not enable forced H2 and custom WS rewrites simultaneously before you capture packets, or triage turns into guesswork.

Topology	Best for	Ops upside	Typical pitfalls
Same host + loopback upstream	Single VPS / always-on remote Mac	Local curl triage; TLS and process share log volumes	Binding `0.0.0.0` widens exposure—use firewall guardrails
Dedicated proxy host	Many backends, blue/green, WAF fronting	Edge decoupled from compute; centralized renewals	Internal routes, SNI, health-check targets drift
Subdomain	WS separated from the main site	Clear path semantics; simpler CDN rules	Multiple certs; HSTS needs its own review
Subpath	Single branded entry domain	Lower cognitive load for users	Prefix stripping, redirect loops, wrong asset roots

Nginx and Caddy: WebSocket headers and timeout checklist

Either way, the edge must upgrade to HTTP/1.1, pass or rebuild Connection: upgrade and Upgrade: websocket, and set read timeouts above your slowest model percentile. Use the table as a code-review checklist.

Easy to miss: proxy buffering. proxy_buffering can help throughput for normal APIs, but streaming or long-lived connections may add latency or feel truncated; if you use streaming channels or SSE-like flows, load-test staging before leaving default buffering on.

Check	Nginx notes	Caddy notes
Protocol version	`proxy_http_version 1.1;`	`reverse_proxy` defaults to HTTP/1.1; watch explicit transports
Upgrade chain	`Upgrade $http_upgrade; Connection "upgrade";`	Usually automatic; custom paths need `header_up`
Host and client IP	`proxy_set_header Host $host;`, `X-Forwarded-*`	`header_up Host {host}`; trust proxy hop counts
Long-lived connections	`proxy_read_timeout` / `send_timeout`	`transport http { read_timeout ... }` (syntax per release docs)
Large bodies	`client_max_body_size`	Body limit directives (see Caddy docs)

nginx

# Snippet: reverse-proxy to local Gateway (port per your deploy; example 18789)
location / {
  proxy_pass http://127.0.0.1:18789;
  proxy_http_version 1.1;
  proxy_set_header Host $host;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_read_timeout 3600s;
  proxy_send_timeout 3600s;
}

Caddyfile

# Snippet: automatic TLS + WebSocket upstream
openclaw.example.com {
    reverse_proxy 127.0.0.1:18789 {
        header_up Host {host}
        header_up X-Forwarded-For {remote_host}
        header_up X-Forwarded-Proto {scheme}
    }
}

info

Tip: After edits, validate handshake semantics upstream with curl -v -H "Connection: Upgrade" -H "Upgrade: websocket" http://127.0.0.1:18789/ before testing public wss://; otherwise TLS and proxy layers blame each other.

Symptom matrix: 502, 413, non-101 WS, redirects, certificates

Map symptoms to edge vs upstream vs app config before rebuilding containers. If the table points to container networking, return to Docker network triage; if channels connect but Slack/Telegram stay silent, use channel integration troubleshooting.

For production-only repros while staging is clean, diff three things: certificate chain CA, CDN/WAF rule versions, and whether proxy map/if branches on User-Agent. Many 502s are edge rule false positives, unrelated to OpenClaw versions.

Symptom	Suspect first	Verify
502 Bad Gateway	Upstream not listening, firewall, wrong socket family	`curl` upstream from the proxy; read error.log for upstream timed out / refused
413 Request Entity Too Large	Low `client_max_body_size` at the edge	Raise and reload; sync CDN/WAF body limits
WS not 101	Missing Upgrade, rewritten path, http→https hop breaking handshake	Check handshake status in browser Network; review location order and `return 301`
Redirect loops	HTTPS forced without `X-Forwarded-Proto`	Temporarily relax edge HSTS for testing; add forwarded headers
Cert warnings / partial client failures	Incomplete chain, wrong SNI, IP certificates	`openssl s_client -servername` to verify chain; check NTP

Six-step runbook: from pilot hostname to an on-call-ready production entry

Freeze the URL plan: document public https:// base, subpath or not, and whether WS shares the HTTP host.
Local probe: from the proxy host, hit loopback upstream and confirm OpenClaw process/port matches docs (example 18789).
Minimal Nginx/Caddy snippet: proxy only the health path first, then widen; each step rolls back cleanly.
Capture one full WS session: verify in both browser and CLI; align logs on request_id if present.
Layer security: IP allow lists, rate limits, custom WAF rules; align token posture with Docker production runbook.
Observability baseline: record upstream P95, WebSocket reconnects per minute, and certificate days remaining for alerts.
Rollback drill: in a maintenance window, practice rolling back proxy config only (not the image) so the runbook has executable commands, not folklore.

Three hard metrics for Grafana and the on-call guide

Upstream RTT P95: from proxy workers to 127.0.0.1:18789 or an internal VIP—ignores public latency so you can tell edge slowness from Gateway slowness.
Abnormal WebSocket close rate: aggregate per minute; spikes often tie to timeouts, releases, or model-side 429s—cross-check provider routing articles.
Certificate validity window: ticket under 14 days remaining; rehearse renewals for Let's Encrypt and commercial CAs in staging first.

If you have centralized logging, plot proxy access 5xx rate against Gateway application error rate: when only one spikes, ownership is obvious and postmortems stay data-driven.

Why a laptop-only Gateway struggles with production ingress and long-running automation

Laptop proxy experiments validate snippets quickly, but sleep, VPN flips, and unstable home uplinks turn TLS and renewals into manual chores; teams cannot contract on 502 SLOs. Putting reverse proxy + always-on Gateway on dedicated 24/7 hosts (for example rented Apple Silicon cloud Macs or controlled VPS) makes rate limits, certificates, and log retention standard changes instead of whoever-is-awake ops. Stable ingress is what makes OpenClaw + CI + signing pipelines auditable and handover-friendly.

Self-managed proxy stacks also chase OpenSSL, Nginx, and OS CVEs; teams binding OpenClaw to CI, signing, and multi-channel bots often save total cost with dedicated predictable compute and clear region/term choices versus home egress experiments. MACCOME cloud Macs offer multi-region nodes and predictable rental tiers as the clean always-on host behind TLS termination: land directory and permission boundaries from the three-platform install guide, then place execution on contract-grade hardware using Mac mini rental rates and region pages—Singapore, Tokyo, Seoul, Hong Kong, Virginia, Silicon Valley—while the edge proxy handles policy and observability only.

For connection, session, and channel issues, search the help center by keyword; pair with remote-desktop and SSH/VNC articles when you need a visual triage window.

FAQ

502—check the proxy or Docker first?

curl loopback upstream from the proxy host; if it fails, return to Docker network triage. For plans and pricing, see Mac mini rental rates.

Subpath or subdomain?

Subdomains usually avoid stripping pitfalls; subpaths need matching base URLs on both sides. Start installs from the three-platform install guide to align versions.

WebSocket drops behind a CDN?

Disable caching, lengthen idle timeouts, or bypass the CDN on a control-plane subdomain. Channel issues belong in Slack/Discord/Telegram troubleshooting.

Logs and tickets—where?

Enterprise changes should flow through the help center so half an Nginx config does not die in chat.

2026 OpenClaw production reverse proxy & TLS: WebSocket, subpath vs subdomain, and 502/handshake triage behind Nginx/Caddy

Six reverse-proxy pitfalls (why curl can reach upstream yet you still get 502)

Topology choices: same host, separate host, subpath, and subdomain—how they change triage

Nginx and Caddy: WebSocket headers and timeout checklist

Symptom matrix: 502, 413, non-101 WS, redirects, certificates

Six-step runbook: from pilot hostname to an on-call-ready production entry

Three hard metrics for Grafana and the on-call guide

Why a laptop-only Gateway struggles with production ingress and long-running automation

2026 OpenClaw production reverse proxy & TLS:
WebSocket, subpath vs subdomain, and 502/handshake triage behind Nginx/Caddy