Multi-Agent Coordination
When a single agent is not enough, you can coordinate multiple agents to tackle complex data tasks. Datafi provides coordination patterns that let agents communicate, share state, and work together -- either as peers or in a hierarchy. You configure coordination at the workflow level, and the platform handles message routing, state synchronization, and lifecycle management.
Coordination Patterns
Datafi supports four coordination patterns. You can use them individually or combine them within a single workflow.
| Pattern | Description | Communication | Best For |
|---|---|---|---|
| Event-driven | Agents react to platform events. An event triggers one or more agents without direct coupling between them. | Publish/subscribe via event bus. | Decoupled pipelines, reactive architectures, monitoring. |
| Message-passing | Agents send and receive typed messages directly. One agent's output becomes another agent's input. | Point-to-point or broadcast messages. | Sequential handoffs, data enrichment chains, review workflows. |
| Shared state | Agents read from and write to a shared state store. Each agent contributes partial results to a common data structure. | Read/write to shared key-value store. | Collaborative analysis, aggregation from multiple sources, consensus-building. |
| Hierarchical | A supervisor agent delegates tasks to worker agents, collects results, and makes decisions. | Parent-child task delegation. | Complex orchestration, divide-and-conquer, multi-stage pipelines. |
Event-Driven Pattern
Agents subscribe to event types. When an event is published (by the platform, a workflow, or another agent), all subscribed agents are triggered independently.
coordination:
pattern: event-driven
events:
- type: data.loaded
filter: "source == 'sales_warehouse'"
agents:
- data-quality-checker
- schema-drift-detector
- type: anomaly.detected
filter: "severity >= 'high'"
agents:
- incident-reporter
- auto-remediation-agent
Message-Passing Pattern
Agents exchange structured messages. The sender specifies the recipient and message schema; the recipient processes the message and optionally responds.
coordination:
pattern: message-passing
flow:
- from: data-collector
to: data-enricher
message:
schema: raw_records
- from: data-enricher
to: report-generator
message:
schema: enriched_records
- from: report-generator
to: email-distributor
message:
schema: formatted_report
Shared State Pattern
Agents read from and write to a shared state store during execution. The store is scoped to the workflow run and supports concurrent access with conflict resolution.
coordination:
pattern: shared-state
state_store:
type: key-value
conflict_resolution: last-write-wins
agents:
- name: revenue-analyzer
writes: ["revenue_by_region", "revenue_trends"]
- name: cost-analyzer
writes: ["cost_by_region", "cost_trends"]
- name: profitability-summarizer
reads: ["revenue_by_region", "cost_by_region", "revenue_trends", "cost_trends"]
writes: ["profitability_report"]
Hierarchical Pattern
A supervisor agent breaks a complex task into subtasks, delegates them to worker agents, collects results, and synthesizes a final output.
coordination:
pattern: hierarchical
supervisor: executive-analyst
workers:
- name: sales-analyst
task: "Analyze Q3 sales performance"
- name: marketing-analyst
task: "Analyze Q3 campaign effectiveness"
- name: ops-analyst
task: "Analyze Q3 operational efficiency"
aggregation:
strategy: supervisor-synthesis
timeout_seconds: 300
Agent Versioning
Every agent in Datafi is versioned using semantic versioning (major.minor.patch). Versioning lets you evolve agents safely without disrupting running workflows.
| Action | Behavior |
|---|---|
| Publish a new version | The previous version remains available. Existing workflows continue using the version they reference. |
| Pin a version | Workflows and triggers reference a specific version (e.g., [email protected]). |
| Use latest | Reference revenue-analyst@latest to always use the most recently published version. |
| Deprecate a version | Mark a version as deprecated. Existing references still work, but new workflows cannot select it. |
| Rollback | Revert to a previous version by updating the workflow reference. |
agents:
- name: revenue-analyst
version: "1.2.0" # pinned version
- name: cost-analyzer
version: "latest" # always uses newest
A/B Testing
Datafi supports A/B testing for agents, allowing you to compare two versions of an agent side by side in production. Traffic is split between versions based on a configurable ratio, and results are tracked independently for each version.
Setting Up an A/B Test
- Navigate to AI > Agent Catalog and select the agent you want to test.
- Click A/B Test and select the two versions to compare.
- Configure the traffic split (e.g., 80/20, 50/50).
- Define comparison metrics (success rate, execution time, token usage, output quality).
- Set a test duration or sample size threshold.
- Start the test.
Monitoring A/B Results
| Metric | Version A | Version B |
|---|---|---|
| Success rate | Percentage of successful runs. | Percentage of successful runs. |
| Average duration | Mean execution time. | Mean execution time. |
| Token usage | Average tokens per run. | Average tokens per run. |
| Quality score | Based on user feedback (thumbs up/down). | Based on user feedback (thumbs up/down). |
ab_test:
agent: revenue-analyst
versions:
a: "1.2.0"
b: "1.3.0-beta"
traffic_split:
a: 80
b: 20
metrics:
- success_rate
- avg_duration_ms
- avg_token_usage
- quality_score
duration_days: 14
The A/B testing dashboard indicates when results reach statistical significance. Avoid drawing conclusions from small sample sizes -- wait until the dashboard confirms confidence before promoting a version.
Lifecycle Triggers
Agents and multi-agent workflows can be triggered automatically through multiple mechanisms.
| Trigger Type | Configuration | Description |
|---|---|---|
| Manual | None (on-demand) | Triggered by a user from the catalog, API, or workflow. |
| Polling | interval, backoff | Checks a condition at regular intervals. Supports exponential backoff to reduce load when the condition is not met. |
| Schedule | cron, timezone | Runs on a cron schedule in the specified timezone. |
| Event | type, filter | Fires when a matching platform event occurs. Supports filter expressions. |
| Webhook | path, auth | Fires when an HTTP request hits the configured webhook path. Supports API key and JWT authentication. |
triggers:
- type: schedule
cron: "0 8 * * MON"
timezone: "America/New_York"
- type: event
event_type: data.loaded
filter: "source == 'sales_warehouse' && row_count > 0"
- type: webhook
path: /hooks/revenue-report
auth:
type: api_key
header: X-API-Key
- type: polling
interval_seconds: 300
backoff:
type: exponential
max_interval_seconds: 3600
condition: "SELECT COUNT(*) FROM staging.pending WHERE status = 'ready'"
Observability
Multi-agent coordination provides additional observability beyond single-agent monitoring:
- Coordination trace -- Visualize the full message flow between agents, including event publications, message handoffs, and state reads/writes.
- Agent dependency graph -- See which agents depend on which, based on coordination patterns.
- Bottleneck detection -- Identify agents that slow down the overall workflow due to long execution times or frequent retries.
- State inspector -- View the contents of the shared state store at any point during execution.
Design Considerations
When designing multi-agent systems, keep these principles in mind:
- Prefer loose coupling -- Use event-driven or shared state patterns when agents do not need direct interaction. This makes it easier to add, remove, or replace agents without disrupting the system.
- Set clear boundaries -- Each agent should have a single, well-defined responsibility. Avoid creating "super agents" that do everything.
- Use guard rails consistently -- Apply resource limits and PII filtering to every agent, not just the entry point. A single unconstrained agent can compromise the entire workflow.
- Version deliberately -- Pin versions in production workflows. Use
latestonly in development and testing environments. - Test coordination -- Test multi-agent workflows end-to-end, not just individual agents. Message schemas, state keys, and event filters can introduce subtle failures that only appear during coordination.
Next Steps
- Agent Builder -- Create the agents that participate in multi-agent workflows.
- Workflow Builder -- Define the graph-based workflows that orchestrate coordination.
- Agent Catalog -- Browse available agents and their versions.
- AI Infrastructure Overview -- Review the platform's AI architecture and LLM provider configuration.