2026 OpenClaw production reverse proxy & TLS:
WebSocket, subpath vs subdomain, and 502/handshake triage behind Nginx/Caddy

~16 min read · MACCOME

Gateway looks healthy in the container, but behind Nginx/Caddy you see 502s, failed WebSocket handshakes, or endless redirects? This article focuses on the edge reverse proxy + TLS termination + WebSocket upgrade layer. It complements Docker network triage: that piece covers which packets can see which peers; this one covers whether the browser-to-Gateway HTTP/1.1 upgrade chain and paths are correct. You get six common pitfalls, two topology/symptom tables, copy-paste snippets, a six-step production runbook, and three on-call metrics. After reading, you can tell whether to fix Upgrade headers or go back to the upstream 127.0.0.1:18789.

Six reverse-proxy pitfalls (why curl can reach upstream yet you still get 502)

  1. proxy_pass without HTTP/1.1 and Upgrade: browsers using WebSocket get 400/502 at the edge while logs look like the Gateway died.
  2. Double-prefix subpaths: the proxy strips /openclaw but the app still emits absolute URLs from /, splitting static assets and WS paths.
  3. Mixing HTTP/2 listeners with WS without checking ALPN: some stacks need a dedicated server for WS or an explicit HTTP/1.1 fallback, otherwise handshakes fail intermittently.
  4. proxy_read_timeout too short: long reasoning bursts or channel traffic get silently cut at the edge, triggering reconnect storms that hammer upstream.
  5. Incomplete certificate chains fixed only for browser warnings: stricter automation clients may fail while desktop CLIs look fine.
  6. CDN default caching or short idle timeouts: control-plane WebSocket traffic is cached like a plain GET or dropped early, feeling like random disconnects.

The topology table below flattens same-host proxy, separate proxy host, subdomain, and subpath trade-offs before we move to header checklists and the symptom matrix.

Topology choices: same host, separate host, subpath, and subdomain—how they change triage

A same-host reverse proxy (Nginx/Caddy colocated with Gateway) keeps triage shortest: curl -v http://127.0.0.1:18789 validates upstream from the box. A separate proxy adds hops—recheck security groups, internal DNS, and TLS SNI. Subdomains usually dodge path stripping and Cookie path fights; subpaths suit a single branded hostname but you must strip the prefix only once on both OpenClaw base URL and the proxy, then confirm in devtools that WS URLs still land on the expected wss:// path.

In change requests, attach two screenshots: the final browser bar URL, and the Network tab WebSocket handshake Request URL. Without them, reviews often misfile the issue as model timeouts or channel OAuth and burn release time on the wrong plane. If you also run a corporate HTTPS proxy, separate office-browser paths (PAC injection) from datacenter proxy paths (often direct)—mismatched symptoms are normal, not exotic.

HTTP/2 at the edge with HTTP/1.1 WebSocket upstreams remains common in 2026: confirm which hop allows the upgrade. Do not enable forced H2 and custom WS rewrites simultaneously before you capture packets, or triage turns into guesswork.

TopologyBest forOps upsideTypical pitfalls
Same host + loopback upstreamSingle VPS / always-on remote MacLocal curl triage; TLS and process share log volumesBinding 0.0.0.0 widens exposure—use firewall guardrails
Dedicated proxy hostMany backends, blue/green, WAF frontingEdge decoupled from compute; centralized renewalsInternal routes, SNI, health-check targets drift
SubdomainWS separated from the main siteClear path semantics; simpler CDN rulesMultiple certs; HSTS needs its own review
SubpathSingle branded entry domainLower cognitive load for usersPrefix stripping, redirect loops, wrong asset roots

Nginx and Caddy: WebSocket headers and timeout checklist

Either way, the edge must upgrade to HTTP/1.1, pass or rebuild Connection: upgrade and Upgrade: websocket, and set read timeouts above your slowest model percentile. Use the table as a code-review checklist.

Easy to miss: proxy buffering. proxy_buffering can help throughput for normal APIs, but streaming or long-lived connections may add latency or feel truncated; if you use streaming channels or SSE-like flows, load-test staging before leaving default buffering on.

CheckNginx notesCaddy notes
Protocol versionproxy_http_version 1.1;reverse_proxy defaults to HTTP/1.1; watch explicit transports
Upgrade chainUpgrade $http_upgrade; Connection "upgrade";Usually automatic; custom paths need header_up
Host and client IPproxy_set_header Host $host;, X-Forwarded-*header_up Host {host}; trust proxy hop counts
Long-lived connectionsproxy_read_timeout / send_timeouttransport http { read_timeout ... } (syntax per release docs)
Large bodiesclient_max_body_sizeBody limit directives (see Caddy docs)
nginx
# Snippet: reverse-proxy to local Gateway (port per your deploy; example 18789)
location / {
  proxy_pass http://127.0.0.1:18789;
  proxy_http_version 1.1;
  proxy_set_header Host $host;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_read_timeout 3600s;
  proxy_send_timeout 3600s;
}
Caddyfile
# Snippet: automatic TLS + WebSocket upstream
openclaw.example.com {
    reverse_proxy 127.0.0.1:18789 {
        header_up Host {host}
        header_up X-Forwarded-For {remote_host}
        header_up X-Forwarded-Proto {scheme}
    }
}
info

Tip: After edits, validate handshake semantics upstream with curl -v -H "Connection: Upgrade" -H "Upgrade: websocket" http://127.0.0.1:18789/ before testing public wss://; otherwise TLS and proxy layers blame each other.

Symptom matrix: 502, 413, non-101 WS, redirects, certificates

Map symptoms to edge vs upstream vs app config before rebuilding containers. If the table points to container networking, return to Docker network triage; if channels connect but Slack/Telegram stay silent, use channel integration troubleshooting.

For production-only repros while staging is clean, diff three things: certificate chain CA, CDN/WAF rule versions, and whether proxy map/if branches on User-Agent. Many 502s are edge rule false positives, unrelated to OpenClaw versions.

SymptomSuspect firstVerify
502 Bad GatewayUpstream not listening, firewall, wrong socket familycurl upstream from the proxy; read error.log for upstream timed out / refused
413 Request Entity Too LargeLow client_max_body_size at the edgeRaise and reload; sync CDN/WAF body limits
WS not 101Missing Upgrade, rewritten path, http→https hop breaking handshakeCheck handshake status in browser Network; review location order and return 301
Redirect loopsHTTPS forced without X-Forwarded-ProtoTemporarily relax edge HSTS for testing; add forwarded headers
Cert warnings / partial client failuresIncomplete chain, wrong SNI, IP certificatesopenssl s_client -servername to verify chain; check NTP

Six-step runbook: from pilot hostname to an on-call-ready production entry

  1. Freeze the URL plan: document public https:// base, subpath or not, and whether WS shares the HTTP host.
  2. Local probe: from the proxy host, hit loopback upstream and confirm OpenClaw process/port matches docs (example 18789).
  3. Minimal Nginx/Caddy snippet: proxy only the health path first, then widen; each step rolls back cleanly.
  4. Capture one full WS session: verify in both browser and CLI; align logs on request_id if present.
  5. Layer security: IP allow lists, rate limits, custom WAF rules; align token posture with Docker production runbook.
  6. Observability baseline: record upstream P95, WebSocket reconnects per minute, and certificate days remaining for alerts.
  7. Rollback drill: in a maintenance window, practice rolling back proxy config only (not the image) so the runbook has executable commands, not folklore.

Three hard metrics for Grafana and the on-call guide

  1. Upstream RTT P95: from proxy workers to 127.0.0.1:18789 or an internal VIP—ignores public latency so you can tell edge slowness from Gateway slowness.
  2. Abnormal WebSocket close rate: aggregate per minute; spikes often tie to timeouts, releases, or model-side 429s—cross-check provider routing articles.
  3. Certificate validity window: ticket under 14 days remaining; rehearse renewals for Let's Encrypt and commercial CAs in staging first.

If you have centralized logging, plot proxy access 5xx rate against Gateway application error rate: when only one spikes, ownership is obvious and postmortems stay data-driven.

Why a laptop-only Gateway struggles with production ingress and long-running automation

Laptop proxy experiments validate snippets quickly, but sleep, VPN flips, and unstable home uplinks turn TLS and renewals into manual chores; teams cannot contract on 502 SLOs. Putting reverse proxy + always-on Gateway on dedicated 24/7 hosts (for example rented Apple Silicon cloud Macs or controlled VPS) makes rate limits, certificates, and log retention standard changes instead of whoever-is-awake ops. Stable ingress is what makes OpenClaw + CI + signing pipelines auditable and handover-friendly.

Self-managed proxy stacks also chase OpenSSL, Nginx, and OS CVEs; teams binding OpenClaw to CI, signing, and multi-channel bots often save total cost with dedicated predictable compute and clear region/term choices versus home egress experiments. MACCOME cloud Macs offer multi-region nodes and predictable rental tiers as the clean always-on host behind TLS termination: land directory and permission boundaries from the three-platform install guide, then place execution on contract-grade hardware using Mac mini rental rates and region pages—Singapore, Tokyo, Seoul, Hong Kong, Virginia, Silicon Valley—while the edge proxy handles policy and observability only.

For connection, session, and channel issues, search the help center by keyword; pair with remote-desktop and SSH/VNC articles when you need a visual triage window.

FAQ

502—check the proxy or Docker first?

curl loopback upstream from the proxy host; if it fails, return to Docker network triage. For plans and pricing, see Mac mini rental rates.

Subpath or subdomain?

Subdomains usually avoid stripping pitfalls; subpaths need matching base URLs on both sides. Start installs from the three-platform install guide to align versions.

WebSocket drops behind a CDN?

Disable caching, lengthen idle timeouts, or bypass the CDN on a control-plane subdomain. Channel issues belong in Slack/Discord/Telegram troubleshooting.

Logs and tickets—where?

Enterprise changes should flow through the help center so half an Nginx config does not die in chat.