Если retrieval, reasoning, generation и validation сжаты в один LLM Agent — при scale вы упираетесь в context overflow, serial timeout и single point of failure. Здесь путь для architecture review: ① три control mode MAS и почему monolithic Agent ломается; ② шесть orchestration patterns (95%+ production), с примерами LangGraph / AutoGen; ③ матрица LangGraph vs CrewAI vs AutoGen и двухслойный MCP+A2A; ④ checkpoint persistence, observability, circuit breaker, token budget, пять типовых pitfalls и decision tree. Дополняет разбор MCP и туториал MCP Server — здесь только multi-agent orchestration → framework selection → production rollout.
MLflow 2026 report: Google Agent Bake-Off — distributed multi-agent снизил wall-clock с 1 часа до 10 минут (6×). AdaptOrch (2026): выбор orchestration topology влияет на SWE-bench сильнее, чем смена base model — +12–23% при корректной топологии. Performance boundary здесь не в FLOPs модели, а в scheduling graph и handoff contract.
Multi-Agent System (MAS) — набор изолированных AI Agent, связанных communication protocol и orchestration runtime, совместно выполняющих task, который monolithic Agent не может уложить в latency/context budget.
Output Agent A → input Agent B; strict DAG без parallel edges. Применение: жёсткие dependency chain, fixed compliance flow (content pipeline, code review gate). Latency bound: T_total = Σ T_i.
from langgraph.graph import StateGraph, START, END
from typing import TypedDict
class PipelineState(TypedDict):
query: str
retrieved_docs: str
analysis: str
final_report: str
def retrieval_agent(state):
return {"retrieved_docs": search_knowledge_base(state["query"])}
def analysis_agent(state):
result = llm.invoke(f"Analyze: {state['retrieved_docs']}")
return {"analysis": result.content}
builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", END)
pipeline = builder.compile()
Trade-off: deterministic, audit-friendly; но zero parallelism, fail-stop на любом node, нет dynamic routing без graph rewrite.
Независимые worker параллельно; aggregator merge. Latency bound: T_total = max(T1, T2, ..., Tn) + T_merge. Применение: multi-source research, multi-axis risk scoring.
from langgraph.types import Send
from typing import Annotated
import operator
class ResearchState(TypedDict):
query: str
research_results: Annotated[list, operator.add]
final_synthesis: str
def supervisor(state):
return [Send("research_worker", {"query": state["query"], "source": s})
for s in ("academic", "industry", "news")]
def research_worker(state):
return {"research_results": [search_by_source(state["query"], state["source"])]}
Mechanism: Send API возвращает список Send objects — runtime spawn parallel subgraph invocations. Annotated[list, operator.add] reducer агрегирует branch output без explicit lock; sync barrier — через defer=True на supervisor node (см. Pitfall 5).
Supervisor: intent classification + task decomposition + routing. Worker: domain-specific execution. Применение: heterogeneous skills (researcher, writer, coder), dynamic intent (Replit assistant, tier-2 support).
Two-tier routing: L1 keyword/hash router (<1 ms, zero LLM cost) для high-confidence intents; L2 LLM router для ambiguous input — снижает token burn на hot path.
P2P task passing без central coordinator; stop condition — round limit, consensus vote, wall-clock timeout. Применение: multi-round debate (code review, architecture option scoring). Production boundary: высокая non-determinism; предпочитайте hierarchical. AutoGen GroupChat: hard cap max_round=6 против infinite loop.
Shared structured workspace; agents read/write по precondition (rule engine или event trigger), без explicit scheduler. Применение: hour/day-scale async, heterogeneous microservices, routing нельзя зафиксировать upfront.
Комбинация patterns в одном graph — типично supervisor + pipeline. Пример: Intent router → simple query direct answer; complex report → supervisor hierarchy → parallel research fan-out → QA pipeline (automated review → human gate → publish).
| Измерение | LangGraph | CrewAI | AutoGen (Microsoft) |
|---|---|---|---|
| Paradigm | State machine graph | Role-based crew | Conversational multi-agent |
| Runtime | Python / JS/TS | Python | Python / .NET |
| Learning curve | Крутая | Пологая | Средняя |
| State / checkpoint | Native (PostgresSaver) | Self-implement | Ограничено |
| Human-in-the-Loop | Native (interrupt()) | Self-implement | Supported |
| Observability | LangSmith | Minimal | Azure Monitor |
| Production readiness | 5/5 | 3/5 | 4/5 |
| Rapid prototype | 3/5 | 5/5 | 4/5 |
| Azure integration | 3/5 | 2/5 | 5/5 |
| Best fit | Stateful complex workflow | Role-based content pipeline | Debate / iterative dialog |
LangGraph: finance/healthcare SLA, checkpoint + HITL, conditional branch/loop как first-class graph edge.
CrewAI: 1–2 day POC, mental model «команда ролей», content/research pipeline без hard state requirements.
AutoGen: Azure-native stack, multi-round agent debate, R&D conversation topology experiments.
2026: complementary layers под Linux Foundation Agentic AI Foundation:
Каждый A2A Agent публикует Agent Card на /.well-known/agent.json; Orchestrator discovery + delegate через JSON-RPC. Детали: MCP protocol guide, MCP Server tutorial.
PostgresSaver как LangGraph checkpoint backend — process restart не теряет graph state; thread_id span sessions для long-horizon workflow.
interrupt() перед side-effect ops (production DB write, payment trigger) — graph pause до explicit human ack.
External agent calls: state machine CLOSED → OPEN (failure threshold) → HALF_OPEN (probe). Предотвращает cascade когда downstream LLM API в 429 storm.
TokenBudgetManager pre-call check; exceed → BudgetExceededException; per-agent metering для cost attribution и FinOps.
MAST (1642 execution traces) — failure distribution в multi-agent runs:
| Тип сбоя | Доля | Mechanism |
|---|---|---|
| System design | 41.77% | Duplicate steps, wrong tool selection, context overflow, missing termination |
| Inter-agent misalignment | 36.94% | Context loss на handoff; hallucination Agent A → «fact» для Agent B |
| Verification failure | 21.30% | Premature stop, incomplete validation gate |
Gap: 57% org уже run agents в production, но только 8% внедрили LLM observability. Ошибки часто возвращают HTTP 200 — APM green, output wrong. Без correlation_id cross-agent root cause analysis невозможен.
Distributed tracing: каждый agent invocation несёт correlation_id; OpenTelemetry span attributes — agent.name, tokens_used, status, handoff schema version.
SLO targets: E2E task completion >85%; P95 latency <30 s; per-agent error rate <5%; retry count; token cost per task; LLM-as-a-Judge quality sample.
Hallucination Agent A propagates через B, C; HTTP 200 на каждом hop. Guardrail: JSON Schema validation на каждом handoff; confidence <0.7 → reject; required fields enforced.
Retry loop без cap — spend 100× за минуты. Guardrail: MAX_ITERATIONS=10, MAX_TOOL_CALLS_PER_AGENT=20, MAX_TOTAL_TOKENS=50_000; interrupt_before на high-cost nodes.
Two-step LLM chain → 8 agents: debug surface растёт экспоненциально. Rule: start sequential; add agent только при evidence (concurrency, context overflow, independent upgrade path). Sweet spot: 3–8 agents.
Happy-path demo; edge input ломает chain. Guardrail: input length cap, prompt injection filter, PII scrub, harmful content gate — ProductionGuardrails с day one.
Supervisor re-enter до завершения slow branch → duplicate dispatch. Fix: builder.add_node("supervisor", supervisor_node, defer=True) — explicit sync barrier после fan-out.
thread_id; resume + HITL.Invariants: ① topology > model; ② start sequential; ③ MCP + A2A — standard stack; ④ observability mandatory; ⑤ 3–8 agents, далее hierarchical decomposition.
Watchlist 2026: federated orchestration (sub-orchestrator federation), multimodal MAS, adaptive topology (AdaptOrch direction), EU AI Act audit chain для agent decisions.
LangGraph Gateway + MCP Server + A2A endpoint на sleeping laptop или shared dev machine — три hidden costs: checkpoint/session loss при lid-close, environment drift → handoff schema mismatch, невозможность 7×24 multi-step workflow. Для stable orchestration и MCP/A2A long connection production stack на MACCOME Mac mini (M4 / M4 Pro) dedicated node обычно ниже TCO, чем борьба с sleep policy локально; тарифы: аренда Mac Mini.
FAQ
LangGraph, CrewAI или AutoGen — что выбрать?
Production state + HITL + complex branch → LangGraph; 1–2 day role-based POC → CrewAI; Azure stack + multi-round debate → AutoGen. См. матрицу выше.
MCP и A2A — разные слои?
MCP — vertical (Agent ↔ tools); A2A — horizontal (Agent ↔ Agent). Подробнее: MCP protocol guide.
Сколько agents оптимально в production?
Empirical sweet spot: 3–8. Beyond — coordination overhead доминирует; split в sub-teams с hierarchical supervisor.
На каком хосте run multi-agent system?
Избегайте lid-close — рвёт checkpoint и long connection. MACCOME M4/M4 Pro 7×24 для LangGraph Gateway и MCP Server. Тарифы: аренда Mac Mini; onboarding: центр помощи.