Once your OpenClaw Gateway is healthy, the next reliability layer is scheduled work: channel probes, housekeeping, and recurring reports. This guide defines pre-flight gates, a minimal openclaw cron loop (when your CLI exposes it), UTC-first timezone rules, and a cold-start checklist after restarts, including why a dedicated remote Mac is a better home for always-on automation.
Make “ready to schedule” an explicit gate, not a default.
| Check | Pass | If fail |
|---|---|---|
| Gateway | openclaw gateway status healthy | fix bind/service first |
| Disk | >20% free on log volume | rotate or expand |
| Doctor | no blocking findings | resolve tokens/config drift |
| Channels | probe succeeds | do not schedule channel-dependent jobs yet |
Assume your build ships openclaw cron subcommands aligned with community docs (cron list, cron status). Order: list → status → enable one job → watch one fire → document. Capture stdout for upgrades.
Store schedules in UTC; print offsets in logs. For “09:00 weekday” rules, name the city whose 09:00 you mean.
Tip: snapshot openclaw cron list before and after each upgrade.
Verify process, registrations, and last-fired timestamps. If timestamps stall, inspect disk and permissions before blaming the scheduler. Cross-read the Gateway troubleshooting runbook.
Laptops sleep; servers should not. Colocating Gateway and scheduled jobs on dedicated Apple Silicon removes power-policy noise. See the unattended launchd/systemd checklist.
Use OS cron for machine hygiene; use OpenClaw schedulers for tasks that need session context and channel credentials. Serialize with locks if both touch the same script.
openclaw gateway status openclaw doctor openclaw cron list || true openclaw cron status || true openclaw logs --follow
OS cron lacks first-class awareness of OpenClaw session state and channel tokens; upgrades can silently break wrapper scripts. MACCOME dedicated remote Macs give stable power, disks, and regions so Gateway plus schedules stay boringly correct.
Not every timer should look the same. Probes should be cheap, idempotent, and safe to run in parallel with manual operations. Housekeeping tasks rotate logs, prune caches, and verify disk quotas; they should never share mutable state with user-driven sessions. Business workflows (daily digests, channel summaries) need explicit owners, retry budgets, and idempotency keys so a duplicate fire does not spam customers.
When a cron-style job calls external APIs, wrap calls with exponential backoff and jitter. Record the HTTP status distribution per job name so you can spot creeping 429 rates before providers hard-throttle you. For model providers, align job concurrency with token budgets and separate interactive traffic from batch summarization.
Define an SLO such as “daily digest must finish within 15 minutes of the scheduled minute in UTC.” Alert on the absence of success logs, not only on explicit errors. Silent stalls often correlate with disk pressure or stale OAuth refresh tokens—both show up faster if you chart log bytes per hour and credential expiry dates alongside cron fires.
Scheduled jobs should use least-privilege credentials distinct from admin CLI tokens. Scope channel permissions narrowly, store secrets in the same volumes you already back up before upgrades, and document which jobs are allowed to mutate production state versus read-only inspection. When in doubt, split destructive operations behind a manual approval channel while keeping probes fully automated.
If you are moving logic from systemd timers into OpenClaw, freeze both systems temporarily with feature flags: run the new job at half frequency while the legacy timer still emits metrics. Compare outputs for two weeks, then disable the timer only after timestamps and side effects match. Never run dual writers to the same SQLite or JSON state file without file locking.
Operators debug faster when a single scheduled execution has a coherent story: enqueue timestamp, worker pick-up, channel send, model call, persistence, and acknowledgement back to the user or datastore. Emit structured fields (job, run_id, attempt, region) on every line so you can pivot from “nothing failed” to “the job never reached the network layer.” Keep correlation IDs stable across retries so duplicate deliveries are obvious instead of masquerading as unrelated errors.
Pair textual logs with lightweight gauges: queue depth, oldest waiting job age, and last successful completion per job name. When queue depth grows while CPU stays flat, you are usually blocked on external APIs or file locks—not on model compute. That distinction saves hours of mis-tuned concurrency limits.
Laptops that sleep through a minute boundary will miss local timers; VMs paused for snapshots exhibit the same symptom. Prefer running authoritative schedulers on hosts with predictable clocks and disabled aggressive sleep policies—exactly the profile of a dedicated remote Mac. If you must tolerate occasional misses, document whether jobs are at-most-once or catch-up safe, and implement explicit catch-up windows instead of silently double-firing during the next wake cycle.
Validate NTP health as part of your weekly SRE checklist. A few seconds of skew rarely breaks minute-level jobs, but hour-aligned financial reconciliations can land on the wrong calendar day when offsets stack with daylight-saving transitions.
When hardware is shared with interactive developers, upgrades and reboots become social negotiations. Moving Gateway plus OpenClaw cron to an isolated rental node gives you published maintenance windows, stable egress, and fewer surprise policy changes from corporate MDM tooling. Treat lease renewal the same way you treat certificate expiry: alert at 30/14/7 days, rehearse migration steps, and snapshot configuration volumes before any major OpenClaw upgrade.
MACCOME’s single-tenant Apple Silicon fleet across six regions is intentionally boring infrastructure: you trade a small recurring rental for removing “my laptop closed” and “IT pushed a patch” from the failure graph of your unattended assistants.
Beyond cron expressions, capture blast radius (which customers see output), rollback commands, dependency services, and an on-call escalation path. Add a “dry-run mode” flag where safe so new hires can rehearse without mutating production. Review these fields quarterly; stale runbooks are how silent regressions survive multiple reorganizations.
Batch summarization jobs often look CPU-light while they saturate network egress or SQLite write locks. Before doubling concurrency, chart p95 latency for downstream APIs, rows updated per minute, and filesystem fsync latency. If the Mac is shared with interactive developers, add a separate “automation” cgroup or process priority policy so a nightly digest cannot starve a human-driven debugging session—or move the digest to a dedicated remote host where contention is contractual, not social.
When a job misfires, capture: expected trigger wall time in UTC, observed first log line timestamp, whether the process was running, disk free percentage, and last successful OAuth refresh. Duplicate fires deserve the same rigor: dump idempotency keys and channel message IDs so you can prove whether customers received one or two notifications. Feed those artifacts back into the runbook so the next on-call engineer inherits evidence instead of folklore.
OpenClaw’s surface area evolves: subcommands rename, defaults shift, and configuration search paths reorder. Pin the automation layer the same way you pin compilers—record the exact semver, hash the binary or container digest in your infrastructure repo, and gate upgrades behind a canary host that runs only synthetic cron jobs for 48 hours. Your scheduled jobs should print version banners at startup so log miners can correlate behavioral changes with releases without guessing from timestamps alone.
When multiple engineers install different nightly builds on laptops that also trigger manual “fix-it” jobs, you inherit irreproducible failures. Centralizing automation on a leased remote Mac with a single blessed image collapses that variance: everyone SSHes into the same toolchain, and cron semantics stay aligned with the gateway process that actually owns channel credentials.
FAQ
Jobs vanished after upgrade?
Diff config volumes and re-run doctor. Public entry: home.
Conflict with systemd timers?
Partition scripts or add file locks; keep business semantics in OpenClaw.
Disk full, no logs?
Fix log rotation and mounts before raising cadence.