Jobs missing after upgrade?

Diff mounted config volumes and rerun doctor.

2026 OpenClaw cron scheduling and unattended Gateway: timezones, failure recovery, and remote Mac topology

Once your OpenClaw Gateway is healthy, the next reliability layer is scheduled work: channel probes, housekeeping, and recurring reports. This guide defines pre-flight gates, a minimal openclaw cron loop (when your CLI exposes it), UTC-first timezone rules, and a cold-start checklist after restarts, including why a dedicated remote Mac is a better home for always-on automation.

Why “the bot replies” does not imply “cron is safe”

Missing health baselines: silent failures when disks fill or permissions drift.
Timezone skew: mixing local crontab semantics with UTC CI causes double or missed runs.
Overlapping schedulers: systemd timers and OpenClaw jobs hammering the same script without locking.

Pre-flight matrix before enabling cron-style jobs

Make “ready to schedule” an explicit gate, not a default.

Check	Pass	If fail
Gateway	`openclaw gateway status` healthy	fix bind/service first
Disk	>20% free on log volume	rotate or expand
Doctor	no blocking findings	resolve tokens/config drift
Channels	probe succeeds	do not schedule channel-dependent jobs yet

Minimal loop: list, enable, verify, rollback

Assume your build ships openclaw cron subcommands aligned with community docs (cron list, cron status). Order: list → status → enable one job → watch one fire → document. Capture stdout for upgrades.

Timezones: author in UTC, read in local

Store schedules in UTC; print offsets in logs. For “09:00 weekday” rules, name the city whose 09:00 you mean.

info

Tip: snapshot openclaw cron list before and after each upgrade.

Cold start after Gateway restart

Verify process, registrations, and last-fired timestamps. If timestamps stall, inspect disk and permissions before blaming the scheduler. Cross-read the Gateway troubleshooting runbook.

Dedicated remote Mac topology

Laptops sleep; servers should not. Colocating Gateway and scheduled jobs on dedicated Apple Silicon removes power-policy noise. See the unattended launchd/systemd checklist.

Split duties with OS cron

Use OS cron for machine hygiene; use OpenClaw schedulers for tasks that need session context and channel credentials. Serialize with locks if both touch the same script.

Six-step rollout

Pin versions and config paths.
Run doctor and gateway status.
List jobs with owners and alert routes.
Enable a low-frequency probe first.
Raise frequency while watching disk/CPU.
Keep a one-command disable and last-known-good snapshot.

bash

openclaw gateway status
openclaw doctor
openclaw cron list || true
openclaw cron status || true
openclaw logs --follow

Three SRE metrics

Last successful fire vs SLA skew
Retry budget with exponential backoff for third-party APIs
Log growth MB/h per job

Why OS-only cron is usually insufficient

OS cron lacks first-class awareness of OpenClaw session state and channel tokens; upgrades can silently break wrapper scripts. MACCOME dedicated remote Macs give stable power, disks, and regions so Gateway plus schedules stay boringly correct.

Taxonomy of scheduled jobs: probes, housekeeping, and business workflows

Not every timer should look the same. Probes should be cheap, idempotent, and safe to run in parallel with manual operations. Housekeeping tasks rotate logs, prune caches, and verify disk quotas; they should never share mutable state with user-driven sessions. Business workflows (daily digests, channel summaries) need explicit owners, retry budgets, and idempotency keys so a duplicate fire does not spam customers.

Backpressure, rate limits, and third-party APIs

When a cron-style job calls external APIs, wrap calls with exponential backoff and jitter. Record the HTTP status distribution per job name so you can spot creeping 429 rates before providers hard-throttle you. For model providers, align job concurrency with token budgets and separate interactive traffic from batch summarization.

Alerting, SLOs, and “silent success” traps

Define an SLO such as “daily digest must finish within 15 minutes of the scheduled minute in UTC.” Alert on the absence of success logs, not only on explicit errors. Silent stalls often correlate with disk pressure or stale OAuth refresh tokens—both show up faster if you chart log bytes per hour and credential expiry dates alongside cron fires.

Security boundaries for automated jobs

Scheduled jobs should use least-privilege credentials distinct from admin CLI tokens. Scope channel permissions narrowly, store secrets in the same volumes you already back up before upgrades, and document which jobs are allowed to mutate production state versus read-only inspection. When in doubt, split destructive operations behind a manual approval channel while keeping probes fully automated.

Migration checklist from systemd timers

If you are moving logic from systemd timers into OpenClaw, freeze both systems temporarily with feature flags: run the new job at half frequency while the legacy timer still emits metrics. Compare outputs for two weeks, then disable the timer only after timestamps and side effects match. Never run dual writers to the same SQLite or JSON state file without file locking.

Observability: model each cron fire as a mini transaction

Operators debug faster when a single scheduled execution has a coherent story: enqueue timestamp, worker pick-up, channel send, model call, persistence, and acknowledgement back to the user or datastore. Emit structured fields (job, run_id, attempt, region) on every line so you can pivot from “nothing failed” to “the job never reached the network layer.” Keep correlation IDs stable across retries so duplicate deliveries are obvious instead of masquerading as unrelated errors.

Pair textual logs with lightweight gauges: queue depth, oldest waiting job age, and last successful completion per job name. When queue depth grows while CPU stays flat, you are usually blocked on external APIs or file locks—not on model compute. That distinction saves hours of mis-tuned concurrency limits.

Clock skew, sleep, and compensating missed triggers

Laptops that sleep through a minute boundary will miss local timers; VMs paused for snapshots exhibit the same symptom. Prefer running authoritative schedulers on hosts with predictable clocks and disabled aggressive sleep policies—exactly the profile of a dedicated remote Mac. If you must tolerate occasional misses, document whether jobs are at-most-once or catch-up safe, and implement explicit catch-up windows instead of silently double-firing during the next wake cycle.

Validate NTP health as part of your weekly SRE checklist. A few seconds of skew rarely breaks minute-level jobs, but hour-aligned financial reconciliations can land on the wrong calendar day when offsets stack with daylight-saving transitions.

Lease windows, upgrades, and why a dedicated cloud Mac reduces scheduling entropy

When hardware is shared with interactive developers, upgrades and reboots become social negotiations. Moving Gateway plus OpenClaw cron to an isolated rental node gives you published maintenance windows, stable egress, and fewer surprise policy changes from corporate MDM tooling. Treat lease renewal the same way you treat certificate expiry: alert at 30/14/7 days, rehearse migration steps, and snapshot configuration volumes before any major OpenClaw upgrade.

MACCOME’s single-tenant Apple Silicon fleet across six regions is intentionally boring infrastructure: you trade a small recurring rental for removing “my laptop closed” and “IT pushed a patch” from the failure graph of your unattended assistants.

Runbook fields every scheduled job should carry

Beyond cron expressions, capture blast radius (which customers see output), rollback commands, dependency services, and an on-call escalation path. Add a “dry-run mode” flag where safe so new hires can rehearse without mutating production. Review these fields quarterly; stale runbooks are how silent regressions survive multiple reorganizations.

Capacity planning: CPU is rarely the first bottleneck

Batch summarization jobs often look CPU-light while they saturate network egress or SQLite write locks. Before doubling concurrency, chart p95 latency for downstream APIs, rows updated per minute, and filesystem fsync latency. If the Mac is shared with interactive developers, add a separate “automation” cgroup or process priority policy so a nightly digest cannot starve a human-driven debugging session—or move the digest to a dedicated remote host where contention is contractual, not social.

Post-incident review template for missed or duplicate fires

When a job misfires, capture: expected trigger wall time in UTC, observed first log line timestamp, whether the process was running, disk free percentage, and last successful OAuth refresh. Duplicate fires deserve the same rigor: dump idempotency keys and channel message IDs so you can prove whether customers received one or two notifications. Feed those artifacts back into the runbook so the next on-call engineer inherits evidence instead of folklore.

CLI drift, semver, and pinning the automation stack

OpenClaw’s surface area evolves: subcommands rename, defaults shift, and configuration search paths reorder. Pin the automation layer the same way you pin compilers—record the exact semver, hash the binary or container digest in your infrastructure repo, and gate upgrades behind a canary host that runs only synthetic cron jobs for 48 hours. Your scheduled jobs should print version banners at startup so log miners can correlate behavioral changes with releases without guessing from timestamps alone.

When multiple engineers install different nightly builds on laptops that also trigger manual “fix-it” jobs, you inherit irreproducible failures. Centralizing automation on a leased remote Mac with a single blessed image collapses that variance: everyone SSHes into the same toolchain, and cron semantics stay aligned with the gateway process that actually owns channel credentials.