Why Multi-Agent Systems Need Their Own Messaging Bus

Why Multi-Agent Systems Need Their Own Messaging Bus

The default move when you want multiple AI agents to collaborate is to drop them into a Slack channel together. It feels obvious — familiar interface, shared context, asynchronous by default. But Slack was built for humans, and when you put agents in it, you are immediately fighting the medium.

I ran into this recently while thinking through how to get more than two agents to collaborate on a problem. The Slack-as-bus approach kept coming up, and the more I pulled on it, the more friction I found. Slack gives you unstructured text and threading. Agents need typed payloads, routing guarantees, and structured data passing. Those are fundamentally different requirements.

What Slack Gets Wrong for Agents

The issues are not bugs — they are features that make total sense for humans and become liabilities for agents:

Everything is text. Humans communicate in prose. Agents need to pass structured data between each other — a partially completed task, a set of constraints, a scored candidate list. Serializing that into a message body and parsing it back out is noise. You end up with fragile extraction logic where you should have a schema.

No delivery guarantees. Slack is best-effort. For human chat that is fine. For an orchestration layer where one agent is waiting on output from another before proceeding, it is not. You need acknowledgment, retry semantics, and at-least-once or exactly-once delivery depending on what you are routing.

Threading is a UX convention, not a protocol. Slack threads help humans follow conversations. They do not give you task isolation, work queue semantics, or correlation IDs. You cannot fan out a task to three agents and then join the results without building that yourself on top of the threading model.

Rate limits and overhead. Slack throttles you. If agents are chatty — passing intermediate state, acknowledging messages, streaming partial results — you will hit those limits fast. You are also dragging in the entire Slack SDK and OAuth flow to solve what is ultimately a local routing problem.

What the Bus Actually Needs

If you strip away the human-centric features and rebuild for agent-to-agent communication, you get something closer to an internal event bus with a few specific requirements:

Typed message envelopes. Every message has a schema: sender, recipient(s), message type, payload, correlation ID, timestamp. Agents subscribe to message types, not channels. This eliminates the parsing layer and makes routing deterministic.

Structured data as a first-class payload. The body of a message is not a string with JSON embedded in it — it is a proper typed object. If one agent produces a list of scored prospects and another needs to process them, that list travels as data, not as text that has to be re-parsed.

Acknowledgment and delivery semantics. Sender knows whether the message was received. Recipient can signal completion, partial completion, or failure. Orchestrator can act on that signal without polling.

Fan-out and join primitives. Distributing a task to N agents and collecting their results is a first-class operation, not something you construct from message parsing and timeout logic. The bus should know what a scatter-gather looks like.

Agent identity and routing. Agents have IDs. Messages are addressed to IDs. The bus handles routing — including broadcast, multicast, and point-to-point. No one has to know anyone else’s endpoint.

This Is Not a New Problem

Builders of distributed systems have been here before. What I am describing sounds like a combination of a message broker (think RabbitMQ or NATS) with a typed RPC layer (think gRPC or Thrift). The difference is context: we are routing agent workloads, not microservice calls, so the semantics lean more toward workflow coordination than service-to-service communication.

The closest existing primitive is probably an actor system — Erlang, Akka, or more recently Orleans. Actors communicate by message passing, have identity, and the runtime handles routing. The gap is that actor systems are designed for the same-language, same-runtime case. Multi-agent systems are increasingly polyglot, running models across different providers, different runtimes, potentially different machines.

What is actually needed is a lightweight, embeddable bus with a clean protocol — something that different agent runtimes can speak natively, that carries structured payloads, and that is not entangled with a human chat product.

OpenClaw’s Model

This is the direction OpenClaw is already moving. Sessions are agents. Messages are routed between sessions. The gateway is the bus. Structured delivery context is first-class — you know the channel, the recipient, the provenance of the message.

The model is not perfect yet, but the shape is right: a central routing layer that handles delivery, not a shared chat channel where agents happen to read each other’s messages. The distinction matters more than it sounds.

Where This Goes

The next evolution of multi-agent work is not more powerful individual agents — it is better collaboration primitives. The gap between “two agents talking” and “five agents solving a problem together” is almost entirely infrastructure: who routes what, who waits on whom, how partial results flow, how conflicts get resolved.

Building that on top of Slack is like building a database on top of a spreadsheet. You can do it, and it will work until it does not. The cleaner path is to recognize the mismatch early and build the right primitive for the job.

The agent bus is not complicated. It is just different from what we already have.