On a leased or self-managed remote Mac, running OpenClaw Gateway on the same machine as Ollama or local vLLM rarely fails because «the model string is wrong». It fails when ports, probe order, unified memory and CPU contention, and start/stop hygiene stack together. This article splits work with the offline model triage guide: that post owns API bridges, context limits, and no-reply flows; this one is the same-box topology runbook. You should leave knowing how to separate listeners, which layer to verify before healthz, and why you downgrade the model before you thrash Gateway when everything is pegged.
11434, common vLLM to 8000, OpenClaw Control UI often 18789; another local reverse proxy or sidecar is the usual double-bind.200 while the provider still points at a cold or broken inference endpoint, so the UI loads but the chat prints nothing.Put the table in the review so «we all meant port 8000» does not turn into «something else took 8000 in week three».
| Component | Typical default | Touchpoint with OpenClaw |
|---|---|---|
| Ollama | HTTP on 127.0.0.1:11434 by default; open the LAN only with an explicit bind and firewall story |
Provider baseURL to the OpenAI-compatible surface; avoid blind reverse proxies with no upstream health |
| vLLM (local) | Often 8000 or custom; multiple instances need disjoint ports and GPU or thread pools |
Same as Ollama: prove /v1/models and a minimal completion before Gateway references it |
| OpenClaw Gateway | Control UI often 18789; follow your real openclaw config |
healthz / readyz first, provider second; see Gateway and model triage |
chat/completions with curl, then start or reload Gateway so «not ready» never becomes sticky state.127.0.0.1; if a container must reach the host, document bridge rules and who owns the firewall.healthz, layer three is an end-to-end chat probe; if any layer is red, keep production traffic away.# Minimal probe order (rewrite host/port) curl -sS "http://127.0.0.1:11434/api/tags" > /dev/null # Ollama alive # curl -sS "http://127.0.0.1:8000/v1/models" > /dev/null # vLLM curl -fsS "http://127.0.0.1:18789/healthz" # Gateway # Then one short chat completion or openclaw doctor—whatever your install documents
Sleep/wake cycles, residential uplinks, or unpredictable neighbors turn reproducible start/stop and probing into probability. Co-hosted Ollama plus Gateway needs a stable thermal envelope and predictable I/O. Moving the agent and inference into a dedicated, 24/7, contract-backed memory and disk profile often beats endless tuning. For production-grade runbooks, MACCOME cloud Macs pair dedicated Apple Silicon with lease models you can put in a ledger—so you fight over resource tables and change control, not luck.
Do not hammer Gateway concurrency before you have a known-good single completion on the provider. Do not bump Gateway before you confirm nothing else owns the inference port. Two clean cuts beat ten pages of tribal knowledge.
FAQ
How does this split work with the offline Ollama/vLLM article?
That article covers API bridging, context, and no-reply triage. This one covers same-machine ports, probes, resources, and start/stop. Pair offline private model triage with Gateway triage.
Gateway in Docker with Ollama on the host—is that «co-hosted»?
Same logical machine, explicit networking beyond the port list: host.docker.internal or bridge IPs, plus verifiable firewall rules. Start from Docker production deployment and the official Docker guide.