Jun 15, 2025

Multi-Agent Systems: When One Model Isn't Enough

Some tasks exceed a single agent. Multi-agent systems, where orchestrators direct specialist workers, are the emerging pattern for enterprise AI workflows.

A single agent operating with a 128K context window can handle a substantial amount of work. Simple research tasks, document extraction, workflow routing, notification drafting — all of these fit comfortably within the single-agent model. But complex enterprise workflows often do not. When a task requires genuinely distinct specialisations applied in sequence, when independent subtasks can be parallelised to reduce total runtime, or when a workflow is long enough that a single context window becomes a constraint, the case for multiple coordinated agents becomes real.

Multi-agent systems are not a more impressive version of single agents. They are a structural choice made in response to specific limitations, and they create their own class of problems while solving the ones that motivated the choice. This post covers when multi-agent architecture is justified, what the key patterns look like, and what goes wrong.

Why Multi-Agent?

Single agents hit three categories of limitation in complex workflows. The first is context window exhaustion. A workflow that requires processing ten lengthy documents, synthesising them, drafting a response, and iterating on that draft based on validation rules may simply not fit in a 128K context — especially when every tool call and its result is logged in the context, and intermediate outputs are carried forward. Multi-agent systems address this by distributing work: each agent operates on a bounded subset of the total workflow with its own context.

The second is lack of specialisation. A general-purpose agent that does everything — research, analysis, writing, code review, compliance checking — will be less reliable at each individual task than a specialist agent scoped to that task alone. The system prompt for a specialist is tighter, its tool set is narrower and better suited to its function, and its evaluation criteria are more precise. A research specialist with access to web search, academic databases, and document readers outperforms a general agent asked to also write and validate on the same task.

The third is parallelisation. Some workflows contain subtasks that are logically independent and can run concurrently. A due diligence workflow might require financial analysis, legal document review, and market research to happen in parallel before the findings are synthesised. A single agent must do these sequentially; a multi-agent system can run them simultaneously and reduce total elapsed time significantly.

The Orchestrator-Worker Pattern

The orchestrator-worker pattern is the most common and most practically useful multi-agent architecture. The orchestrator receives the overall goal, decomposes it into subtasks, delegates each subtask to the appropriate specialist worker agent, receives and aggregates the results, and produces the final output.

The orchestrator is itself an LLM, typically with a broader system prompt that describes the available workers, their capabilities, and the decomposition strategy for different task types. It does not execute tasks directly — it routes and coordinates. Workers are specialist agents with narrow, well-defined scopes: a researcher, an analyst, a writer, a validator. Each worker receives a specific task from the orchestrator and returns a specific result.

This separation has a practical advantage beyond the architectural: each worker can be evaluated, improved, and debugged independently. If the writer’s output quality is poor, you can work on the writer’s system prompt and tool definitions without touching the researcher or the validator. This modularity is difficult to achieve in a monolithic single-agent system as scope expands.

Real Use Cases

Research and content pipelines are the cleanest illustration of orchestrator-worker in practice. A user submits a research request. The orchestrator identifies that this requires three distinct phases: source identification and retrieval (researcher agent, tools: web search, academic database query, document reader), synthesis and gap analysis (analyst agent, tools: comparison, cross-reference, structured summarisation), and writing and formatting (writer agent, tools: draft generation, style checking). An optional fourth agent — a reviewer — checks the final draft for factual consistency with the source documents before the output is returned.

Each agent does one thing well. The researcher does not draft. The writer does not retrieve. The reviewer does not synthesise. The orchestrator holds the process together.

Document processing workflows benefit from a similar structure. An extractor agent reads the incoming document and pulls structured fields. A validator agent checks those fields against known data types, formats, and business rules. A formatter agent produces the output in the required schema. An approver agent — or a human interrupt — handles exceptions that the validator flags. The same structure applies to contract review, invoice processing, and compliance document analysis.

Code review pipelines are a particularly compelling use case. A static analysis agent runs linting and type-checking tools and returns structured findings. A logic review agent reads the code and the associated requirements and assesses whether the implementation matches the specification. A security review agent applies known vulnerability patterns and checks for common issues. A summariser agent produces the final review comment that the developer sees, synthesising findings from all three reviewers into a coherent and prioritised output.

Communication Patterns

How agents share information with each other is a design decision with significant implications for system reliability and auditability.

Shared memory gives all agents read and write access to a common context store — typically a structured document or a key-value store that persists the task state across agent runs. This is the simplest approach for orchestrator-worker systems where the orchestrator controls the write order. The risk is write conflicts if multiple workers update the store simultaneously, which requires either sequential execution of workers that share a write target, or a locking mechanism.

Message passing uses explicit handoffs: the orchestrator sends each worker a structured payload containing the inputs it needs, and receives a structured response containing the worker’s outputs. There is no shared global state — each agent operates on what it has been given. This is more auditable (every handoff is logged as a discrete event) and easier to debug, at the cost of more explicit orchestration logic.

Shared tool access allows multiple agents to call the same tools — the same search API, the same database — with the orchestrator responsible for coordinating which agent calls which tool and when. This is useful when different specialist agents need access to the same data sources but should not need to replicate tool definitions across workers.

In practice, most production multi-agent systems use a combination: shared tool access for data retrieval, message passing for structured handoffs between workers, and a shared task state document that the orchestrator updates as work progresses.

Failure Modes Unique to Multi-Agent Systems

Single-agent failure modes are well-understood: tool failures, context overflow, goal drift, hallucination. Multi-agent systems inherit all of these and add several that are specific to the multi-agent structure.

Error propagation is the most dangerous. If an early agent in the pipeline produces incorrect output — a researcher that misidentifies a source, an extractor that pulls the wrong field value — every downstream agent builds on that error. The analyst synthesises incorrect source material. The writer produces a confident, coherent document based on incorrect synthesis. The reviewer may not have access to the original data to detect the error. A single bad output early in the pipeline can poison the entire workflow, and the final output may show no obvious signs of the underlying error.

Conflicting outputs arise when multiple agents assess the same input from different perspectives and reach incompatible conclusions. A logic review agent concludes the implementation is correct; a security review agent concludes it introduces a vulnerability. The summariser must reconcile these — and may not be equipped to do so correctly without explicit guidance in its system prompt.

Coordination overhead is a real and often underestimated cost. Every agent handoff involves a model call. An orchestrator that makes poor decomposition decisions — routing tasks to workers that are not well-suited to them, or creating unnecessary sequential dependencies between tasks that could run in parallel — can make the multi-agent system slower and more expensive than a single agent handling the same task.

Infinite loops between agents are possible when the output of one agent triggers the input of another in a cycle without a defined termination condition. An orchestrator that repeatedly sends the same task to a worker because the worker’s output does not meet a quality threshold, without a maximum retry count, will loop indefinitely.

When Not to Use Multi-Agent

For tasks a well-scoped single agent can handle comfortably, multi-agent adds complexity without proportional value. The architectural overhead — orchestrator logic, communication protocols, per-agent evaluation, inter-agent error handling — is substantial. It is justified when the task genuinely requires specialisation, parallelisation, or exceeds single-agent context limits. It is not justified as a default architectural pattern or as a way of adding perceived sophistication to a system.

The progression should be: start with the simplest possible implementation. Graduate to multi-agent when you have evidence of a specific limitation — not an anticipated one, but one you have observed in production. The evidence might be context overflow on real tasks, measurably worse quality from a general agent versus a specialist on a specific subtask, or runtime that can be cut significantly by parallelising independent work.

Frameworks and the Custom Build Question

AutoGen (Microsoft), CrewAI, and LangGraph are the most actively developed multi-agent frameworks at the time of writing. AutoGen provides flexible agent communication patterns and is well-suited to research and code-generation workflows. CrewAI offers a higher-level abstraction with role-based agent definitions and built-in sequential and parallel execution. LangGraph provides fine-grained control over the agent loop as a directed graph, at the cost of more explicit configuration.

The tradeoff is consistent across frameworks: higher-level abstractions are faster to prototype and harder to extend precisely. Most teams we work with that are operating multi-agent systems in production have ended up with custom orchestration logic — often 300 to 500 lines of Python — built on top of direct LLM API calls, rather than adopting a heavy framework. The framework gets them to a working prototype faster; the custom build gives them the control they need for production reliability.

The framework-versus-custom decision depends on the stability of the workflow structure. If the workflow is well-defined and unlikely to change significantly, a framework is fine. If the workflow will evolve — new worker types, changing communication patterns, varying decomposition strategies — custom orchestration is easier to maintain.

The Architectural Discipline

Multi-agent systems are a structural choice, not a solution to the hard problems of agentic AI. The hard problems — reliable tool use, handling unexpected inputs, maintaining goal coherence across many steps, operating safely in adversarial conditions — exist in single-agent and multi-agent systems alike. What multi-agent adds is a specific set of capabilities (specialisation, parallelisation, context distribution) alongside a specific set of new problems (error propagation, conflicting outputs, coordination overhead).

Use it deliberately. Build the single-agent version first, understand exactly which limitation you are hitting, and design the multi-agent architecture to solve that specific limitation — not to solve all possible future limitations at once.

Building Your First AI Agent: A Practical Guide — The single-agent foundation — tool design, orchestration loops, and human checkpoints — that you need to master before going multi-agent.
Agentic AI Security: Prompt Injection and Containment — Security and containment complexity scales with multi-agent systems, making this a required read alongside any multi-agent architecture.
Nematix Generative AI Services — How Nematix designs and delivers multi-agent workflows for enterprise clients in Malaysia and Southeast Asia.

Learn how Nematix’s Innovation Engineering services help businesses build production-ready AI systems.