Multi-Agent оркестрация в production: design patterns и rollout (гид 2026)

Около 28 мин чтения · MACCOME

Если retrieval, reasoning, generation и validation сжаты в один LLM Agent — при scale вы упираетесь в context overflow, serial timeout и single point of failure. Здесь путь для architecture review: ① три control mode MAS и почему monolithic Agent ломается; ② шесть orchestration patterns (95%+ production), с примерами LangGraph / AutoGen; ③ матрица LangGraph vs CrewAI vs AutoGen и двухслойный MCP+A2A; ④ checkpoint persistence, observability, circuit breaker, token budget, пять типовых pitfalls и decision tree. Дополняет разбор MCP и туториал MCP Server — здесь только multi-agent orchestration → framework selection → production rollout.

Почему monolithic Agent ломается при scale

  1. Context window bottleneck: промежуточные артефакты заполняют context; quality downstream inference падает нелинейно — типичный symptom при >60% fill rate.
  2. Dilution специализации: один Agent одновременно retrieval + codegen + audit — каждый subtask получает suboptimal prompt budget.
  3. Serial execution wall: total latency = sum(Ti); нет parallelism на независимых subtask без explicit fan-out.
  4. Single point of failure: crash или hallucination одного runtime node обнуляет весь workflow; нет isolation boundary.

MLflow 2026 report: Google Agent Bake-Off — distributed multi-agent снизил wall-clock с 1 часа до 10 минут (6×). AdaptOrch (2026): выбор orchestration topology влияет на SWE-bench сильнее, чем смена base model — +12–23% при корректной топологии. Performance boundary здесь не в FLOPs модели, а в scheduling graph и handoff contract.

MAS: определение и control plane

Multi-Agent System (MAS) — набор изолированных AI Agent, связанных communication protocol и orchestration runtime, совместно выполняющих task, который monolithic Agent не может уложить в latency/context budget.

Четыре invariant каждого Agent

  • Role isolation: одна bounded subtask (retrieval, reasoning, generation, validation)
  • Tool surface: минимальный набор MCP tools / function calls для своей роли
  • State isolation: private context/memory; shared state только через typed handoff schema
  • Replaceability: swap model или prompt без rebuild всего graph

Три control mode (scheduling semantics)

  • Centralized: Orchestrator dispatch → audit trail полный, но orchestrator — throughput bottleneck и SPOF на control plane
  • Decentralized: P2P message passing → lower tail latency, выше non-determinism; debug без distributed trace почти невозможен
  • Hierarchical: Top Orchestrator → Team Lead → Worker — компромисс; de facto default в production (Replit, enterprise support bots)

Шесть orchestration patterns (95%+ production coverage)

Pattern 1: Sequential pipeline

Output Agent A → input Agent B; strict DAG без parallel edges. Применение: жёсткие dependency chain, fixed compliance flow (content pipeline, code review gate). Latency bound: T_total = Σ T_i.

python · LangGraph
from langgraph.graph import StateGraph, START, END
from typing import TypedDict

class PipelineState(TypedDict):
    query: str
    retrieved_docs: str
    analysis: str
    final_report: str

def retrieval_agent(state):
    return {"retrieved_docs": search_knowledge_base(state["query"])}

def analysis_agent(state):
    result = llm.invoke(f"Analyze: {state['retrieved_docs']}")
    return {"analysis": result.content}

builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", END)
pipeline = builder.compile()

Trade-off: deterministic, audit-friendly; но zero parallelism, fail-stop на любом node, нет dynamic routing без graph rewrite.

Pattern 2: Parallel fan-out / fan-in

Независимые worker параллельно; aggregator merge. Latency bound: T_total = max(T1, T2, ..., Tn) + T_merge. Применение: multi-source research, multi-axis risk scoring.

python · LangGraph Send API
from langgraph.types import Send
from typing import Annotated
import operator

class ResearchState(TypedDict):
    query: str
    research_results: Annotated[list, operator.add]
    final_synthesis: str

def supervisor(state):
    return [Send("research_worker", {"query": state["query"], "source": s})
            for s in ("academic", "industry", "news")]

def research_worker(state):
    return {"research_results": [search_by_source(state["query"], state["source"])]}
info

Mechanism: Send API возвращает список Send objects — runtime spawn parallel subgraph invocations. Annotated[list, operator.add] reducer агрегирует branch output без explicit lock; sync barrier — через defer=True на supervisor node (см. Pitfall 5).

Pattern 3: Hierarchical supervisor-worker

Supervisor: intent classification + task decomposition + routing. Worker: domain-specific execution. Применение: heterogeneous skills (researcher, writer, coder), dynamic intent (Replit assistant, tier-2 support).

Two-tier routing: L1 keyword/hash router (<1 ms, zero LLM cost) для high-confidence intents; L2 LLM router для ambiguous input — снижает token burn на hot path.

Pattern 4: Swarm / network

P2P task passing без central coordinator; stop condition — round limit, consensus vote, wall-clock timeout. Применение: multi-round debate (code review, architecture option scoring). Production boundary: высокая non-determinism; предпочитайте hierarchical. AutoGen GroupChat: hard cap max_round=6 против infinite loop.

Pattern 5: Blackboard

Shared structured workspace; agents read/write по precondition (rule engine или event trigger), без explicit scheduler. Применение: hour/day-scale async, heterogeneous microservices, routing нельзя зафиксировать upfront.

Pattern 6: Hybrid

Комбинация patterns в одном graph — типично supervisor + pipeline. Пример: Intent router → simple query direct answer; complex report → supervisor hierarchy → parallel research fan-out → QA pipeline (automated review → human gate → publish).

Framework matrix: LangGraph vs CrewAI vs AutoGen

Измерение LangGraph CrewAI AutoGen (Microsoft)
ParadigmState machine graphRole-based crewConversational multi-agent
RuntimePython / JS/TSPythonPython / .NET
Learning curveКрутаяПологаяСредняя
State / checkpointNative (PostgresSaver)Self-implementОграничено
Human-in-the-LoopNative (interrupt())Self-implementSupported
ObservabilityLangSmithMinimalAzure Monitor
Production readiness5/53/54/5
Rapid prototype3/55/54/5
Azure integration3/52/55/5
Best fitStateful complex workflowRole-based content pipelineDebate / iterative dialog

LangGraph: finance/healthcare SLA, checkpoint + HITL, conditional branch/loop как first-class graph edge.

CrewAI: 1–2 day POC, mental model «команда ролей», content/research pipeline без hard state requirements.

AutoGen: Azure-native stack, multi-round agent debate, R&D conversation topology experiments.

Двухслойный protocol stack: MCP + A2A

2026: complementary layers под Linux Foundation Agentic AI Foundation:

  • MCP (vertical): Agent ↔ tools / DB / API — Anthropic-led; write-once tool surface для всех MCP clients
  • A2A (horizontal): Agent ↔ Agent — Google open-source (апрель 2025), v1.0 (начало 2026), 50+ partners (Atlassian, Salesforce, SAP); task delegation, capability discovery, state sync via JSON-RPC 2.0

Каждый A2A Agent публикует Agent Card на /.well-known/agent.json; Orchestrator discovery + delegate через JSON-RPC. Детали: MCP protocol guide, MCP Server tutorial.

Production engineering

6.1 Checkpoint persistence и resume

PostgresSaver как LangGraph checkpoint backend — process restart не теряет graph state; thread_id span sessions для long-horizon workflow.

6.2 Human-in-the-Loop

interrupt() перед side-effect ops (production DB write, payment trigger) — graph pause до explicit human ack.

6.3 Circuit breaker и retry policy

External agent calls: state machine CLOSED → OPEN (failure threshold) → HALF_OPEN (probe). Предотвращает cascade когда downstream LLM API в 429 storm.

6.4 Token budget enforcement

TokenBudgetManager pre-call check; exceed → BudgetExceededException; per-agent metering для cost attribution и FinOps.

Observability: distributed trace вместо green dashboard

MAST (1642 execution traces) — failure distribution в multi-agent runs:

Тип сбоя Доля Mechanism
System design41.77%Duplicate steps, wrong tool selection, context overflow, missing termination
Inter-agent misalignment36.94%Context loss на handoff; hallucination Agent A → «fact» для Agent B
Verification failure21.30%Premature stop, incomplete validation gate
warning

Gap: 57% org уже run agents в production, но только 8% внедрили LLM observability. Ошибки часто возвращают HTTP 200 — APM green, output wrong. Без correlation_id cross-agent root cause analysis невозможен.

Distributed tracing: каждый agent invocation несёт correlation_id; OpenTelemetry span attributes — agent.name, tokens_used, status, handoff schema version.

SLO targets: E2E task completion >85%; P95 latency <30 s; per-agent error rate <5%; retry count; token cost per task; LLM-as-a-Judge quality sample.

Pitfalls и guardrails

Pitfall 1: Context pollution

Hallucination Agent A propagates через B, C; HTTP 200 на каждом hop. Guardrail: JSON Schema validation на каждом handoff; confidence <0.7 → reject; required fields enforced.

Pitfall 2: Infinite loop / runaway token burn

Retry loop без cap — spend 100× за минуты. Guardrail: MAX_ITERATIONS=10, MAX_TOOL_CALLS_PER_AGENT=20, MAX_TOTAL_TOKENS=50_000; interrupt_before на high-cost nodes.

Pitfall 3: Over-engineering

Two-step LLM chain → 8 agents: debug surface растёт экспоненциально. Rule: start sequential; add agent только при evidence (concurrency, context overflow, independent upgrade path). Sweet spot: 3–8 agents.

Pitfall 4: Demo-to-production gap

Happy-path demo; edge input ломает chain. Guardrail: input length cap, prompt injection filter, PII scrub, harmful content gate — ProductionGuardrails с day one.

Pitfall 5: Parallel sync race (LangGraph)

Supervisor re-enter до завершения slow branch → duplicate dispatch. Fix: builder.add_node("supervisor", supervisor_node, defer=True) — explicit sync barrier после fan-out.

Decision tree (topology selection)

  1. Linear dependency? → Да: subtasks parallelizable? → Нет → Sequential pipeline; Да → Fan-out + pipeline hybrid
  2. Нет: agent с decision authority? → Да: need sub-teams? → Нет → Supervisor-worker; Да → Hierarchical (supervisors of supervisors)
  3. → Нет: long async (hours+)? → Да → Blackboard; Нет: agents ≤5, termination explicit? → Да → Swarm (hard stop rules); Нет → Re-decompose → hierarchical

Десять шагов: selection → production deploy

  1. Validate monolithic bottleneck: measure context fill, serial latency, failure modes на real workload — подтвердить need for MAS, не over-design.
  2. Select topology: decision tree выше; default sequential; fan-out только с concurrency evidence.
  3. Pick framework: matrix LangGraph / CrewAI / AutoGen; finance/healthcare/long-running → LangGraph.
  4. Define agent boundaries: single responsibility, isolated tool set, typed I/O schema (3–8 agents).
  5. Wire MCP tool layer: external systems через MCP Server — no N× duplicate integration glue.
  6. Cross-agent via A2A: publish Agent Card; orchestrator delegate by capability discovery.
  7. Checkpoint persistence: PostgreSQL + thread_id; resume + HITL.
  8. Observability stack: OpenTelemetry + SLO dashboard + LLM-as-a-Judge sampling.
  9. Hard guardrails: token budget, iteration cap, circuit breaker, handoff schema validation.
  10. Migrate to 7×24 host: orchestration + MCP/A2A long connection не должны жить на sleeping laptop; dedicated Mac node для Gateway и checkpoint store.

Три метрики для architecture review

  • Google Agent Bake-Off: 1 ч → 10 мин (6×) — distributed MAS wall-clock gain на internal benchmark.
  • AdaptOrch: +12–23% на SWE-bench — topology selection > base model swap.
  • MAST: 57% prod agents / 8% observability — 41.77% system design failures + 36.94% misalignment = incident surface без trace.

Summary и trends 2026

Invariants: ① topology > model; ② start sequential; ③ MCP + A2A — standard stack; ④ observability mandatory; ⑤ 3–8 agents, далее hierarchical decomposition.

Watchlist 2026: federated orchestration (sub-orchestrator federation), multimodal MAS, adaptive topology (AdaptOrch direction), EU AI Act audit chain для agent decisions.

LangGraph Gateway + MCP Server + A2A endpoint на sleeping laptop или shared dev machine — три hidden costs: checkpoint/session loss при lid-close, environment drift → handoff schema mismatch, невозможность 7×24 multi-step workflow. Для stable orchestration и MCP/A2A long connection production stack на MACCOME Mac mini (M4 / M4 Pro) dedicated node обычно ниже TCO, чем борьба с sleep policy локально; тарифы: аренда Mac Mini.

FAQ

LangGraph, CrewAI или AutoGen — что выбрать?

Production state + HITL + complex branch → LangGraph; 1–2 day role-based POC → CrewAI; Azure stack + multi-round debate → AutoGen. См. матрицу выше.

MCP и A2A — разные слои?

MCP — vertical (Agent ↔ tools); A2A — horizontal (Agent ↔ Agent). Подробнее: MCP protocol guide.

Сколько agents оптимально в production?

Empirical sweet spot: 3–8. Beyond — coordination overhead доминирует; split в sub-teams с hierarchical supervisor.

На каком хосте run multi-agent system?

Избегайте lid-close — рвёт checkpoint и long connection. MACCOME M4/M4 Pro 7×24 для LangGraph Gateway и MCP Server. Тарифы: аренда Mac Mini; onboarding: центр помощи.