2026 OpenClaw: MCP Tooling, ClawHub Skills Install/Verify & Gateway Triage Runbook

About 13 min read · MACCOME

Audience: Gateway runs, but MCP tools never appear, calls time out, or Skills vanish after restart. Outcome: Keep bootstrap in the install guide and Docker production runbook; keep persistence in the volumes & Skills permissions article. This runbook covers declare → process visibility → Gateway registration → model/tool/channel triage. Layout: six pitfalls, two matrices, config sketch, six steps, three KPIs, closing guidance.

Why does Gateway act like MCP is missing?

MCP is a JSON-RPC session between Gateway and a child process or remote endpoint. Config entries exist ≠ child starts; child starts ≠ schemas returned. Six frequent misreads follow.

  1. Environment only in interactive shells: daemons, systemd, launchd, or Compose never see PATH or API keys from ~/.zshrc.
  2. ClawHub Skills on read-only or anonymous volumes: downloads look fine until the container recreates—see the volumes article.
  3. Stale tool caches: configs changed but UI/CLI lists stay old; reload per docs instead of assuming failure.
  4. Timeouts too tight: first cold calls across RTT need different thresholds than steady state.
  5. Overlapping AGENTS.md / bootstrap text: duplicate instructions across MCP and Skills inflate context; split boundaries per the Skills tuning checklist.
  6. Channel issues mistaken for MCP: fix Slack/Telegram OAuth paths before blaming tools.

Run openclaw doctor using the order in the post-install doctor guide; this article adds the tool-registration evidence chain, not another install tutorial.

Keep a one-page “minimum repro card” per MCP server: one read query, one negative test that must be denied, and three expected log tokens—on-call can compare cards to spot config regressions without rereading giant prompts. Note allowed egress and data classification on the card so incidents never widen tokens without a record.

Table 1: MCP symptom → evidence → action

Field names vary by OpenClaw version; this table locks order of operations.

SymptomCollect firstLikely rootAuditable action
Empty/partial tool listGateway logs, child exit codesMissing binary, cwd, permission deniedUse absolute command/args/cwd; run the child as the same user as Gateway
First call slow, then OKCold-start timing, package fetch logsnpx -y or runtime JITPrewarm jobs; pin versions in images; relax first-call timeout
Steady timeoutsChild alive, CPU, FD usageDeadlock, blocking IOSample/trace where allowed; A/B with a read-only tool
“Tool not registered”Schema logs, protocol versionImplementation mismatchAlign MCP versions; pin minors; read upstream changelog

Table 2: ClawHub Skills vs in-repo Skills vs MCP

Publish a capability matrix so one workflow is not described three different ways.

SourceBest forVersioningRisk
ClawHub / marketplaceRapid experimentsPin commit or semver range; weekly diffUpstream drift—needs regression tests
Repo SKILL.md / private packsCompliance-heavy flowsShip with mainline via PRMaintenance load; align with MCP scope
MCP (system of record)DBs, tickets, internal HTTP APIsIndependent release cadenceOver-broad tokens—maintain allowlists
config
# Structural sketch only—real keys, nesting, and hot reload follow current OpenClaw docs.
# Goal: Gateway launches an MCP server over stdio as a fixed user.
#
# mcpServers:
#   internal-readonly-lookup:
#     command: /usr/local/bin/node
#     args: ["/opt/mcp-servers/lookup/dist/index.js"]
#     env:
#       LOOKUP_API_TOKEN: "${LOOKUP_TOKEN_READONLY}"
#
# ClawHub Skill: extract/clone into the team skills directory, then refresh the
# skill index or run the documented reload command for your version.
warning

Warning: MCP connects assistants to production data. Least privilege and audit trails beat “just make it work.” Split read vs write servers, split tokens, and attach allowlist snippets to the change ticket.

Six steps: from “chat works” to “tool calls are replayable”

  1. Freeze topology: bare metal, remote Mac, or container—document user, PATH, cwd, and bind mounts.
  2. Register MCP: fill command/args/env per docs; manually launch the child as the same identity as Gateway and confirm handshake logs.
  3. Install ClawHub Skills: land on persistent storage; record version and checksum—never only the ephemeral layer.
  4. Trim overlapping Skills text: move long retrieval to memory_search or doc tools to curb context growth.
  5. Automate three checks: cold start, steady call, and a deliberate failure (e.g., disconnect) to validate timeouts and degradation.
  6. Update the ops guide: reload order, rollback (remove server + restart Gateway), owners—same page as on-call.

Three KPIs worth weekly review

  1. Registration coverage: declared MCP servers vs tools actually listed, sliced by release.
  2. First-call P95 vs steady P95: treat warm-up separately from steady state.
  3. Duplicate capability count: actions described in MCP, ClawHub, and AGENTS.md—anything >1 needs a signed waiver.

On remote Macs or cloud hosts, disk and log rotation affect MCP children that spill temp files to small system volumes—timeouts may look random though the model config is unchanged. Review host ops alongside tool config.

For HTTP/SSE MCP fronts, include reverse-proxy idle timeouts, Upgrade handling, and TLS termination: Gateway may log a successful handshake while the edge proxy returns 499/504. Cross-check the Nginx/Caddy reverse-proxy guide before only raising OpenClaw timeouts.

Directional community note (not a benchmark): three heavy MCP servers plus wide retrieval often produces minute-scale queue jitter—capability matrices and allowlists beat infinite plugins for SLA.

Why laptops and ad-hoc hosts struggle with long-lived tool governance

Sleep, VPN flaps, and path drift make child processes and skill indexes unpredictable. Connecting real business data demands 24/7 uptime, persistent paths, and auditable permissions.

Self-managed boxes without multi-region choice or flexible terms encourage shared hosts where cold starts and log IO contend. Placing Gateway on dedicated Apple Silicon with predictable disks and egress—typical of a professional Mac cloud—usually makes MCP and Skills policies enforceable in contracts. MACCOME offers multi-region Mac Mini M4 / M4 Pro with flexible rental terms as a stable base for Gateway and build farms; confirm public rates and help-center SLAs before ordering.

Pilot the three checks from this runbook on a remote Mac before promoting one image fleet-wide—avoid “works locally, times out in prod” loops. If Gateway is internet-facing, ship TLS, rate limits, and IP allowlists in the same change, not as a later patch.

FAQ

How does this pair with channel onboarding?

Channel guides cover Slack/Discord/Telegram OAuth; this article covers tool discovery. If messages reach Gateway but tools fail, gather evidence from Table 1 before revisiting channel “connected but silent” cases.

What should rollback include?

Remove MCP entries, document restart order, run a read-only verification query, and confirm tool counts return to baseline on dashboards. Align billing using rental rates.

Container vs bare-metal paths differ—now what?

Maintain an absolute-path matrix per runtime; never let the model guess paths in chat. Cross-check the help center with the Docker volumes article.