Skip to main content

Multi-Agent Coordination

When a single agent is not enough, you can coordinate multiple agents to tackle complex data tasks. Datafi provides coordination patterns that let agents communicate, share state, and work together -- either as peers or in a hierarchy. You configure coordination at the workflow level, and the platform handles message routing, state synchronization, and lifecycle management.


Coordination Patterns

Datafi supports four coordination patterns. You can use them individually or combine them within a single workflow.

PatternDescriptionCommunicationBest For
Event-drivenAgents react to platform events. An event triggers one or more agents without direct coupling between them.Publish/subscribe via event bus.Decoupled pipelines, reactive architectures, monitoring.
Message-passingAgents send and receive typed messages directly. One agent's output becomes another agent's input.Point-to-point or broadcast messages.Sequential handoffs, data enrichment chains, review workflows.
Shared stateAgents read from and write to a shared state store. Each agent contributes partial results to a common data structure.Read/write to shared key-value store.Collaborative analysis, aggregation from multiple sources, consensus-building.
HierarchicalA supervisor agent delegates tasks to worker agents, collects results, and makes decisions.Parent-child task delegation.Complex orchestration, divide-and-conquer, multi-stage pipelines.

Event-Driven Pattern

Agents subscribe to event types. When an event is published (by the platform, a workflow, or another agent), all subscribed agents are triggered independently.

coordination:
pattern: event-driven
events:
- type: data.loaded
filter: "source == 'sales_warehouse'"
agents:
- data-quality-checker
- schema-drift-detector
- type: anomaly.detected
filter: "severity >= 'high'"
agents:
- incident-reporter
- auto-remediation-agent

Message-Passing Pattern

Agents exchange structured messages. The sender specifies the recipient and message schema; the recipient processes the message and optionally responds.

coordination:
pattern: message-passing
flow:
- from: data-collector
to: data-enricher
message:
schema: raw_records
- from: data-enricher
to: report-generator
message:
schema: enriched_records
- from: report-generator
to: email-distributor
message:
schema: formatted_report

Shared State Pattern

Agents read from and write to a shared state store during execution. The store is scoped to the workflow run and supports concurrent access with conflict resolution.

coordination:
pattern: shared-state
state_store:
type: key-value
conflict_resolution: last-write-wins
agents:
- name: revenue-analyzer
writes: ["revenue_by_region", "revenue_trends"]
- name: cost-analyzer
writes: ["cost_by_region", "cost_trends"]
- name: profitability-summarizer
reads: ["revenue_by_region", "cost_by_region", "revenue_trends", "cost_trends"]
writes: ["profitability_report"]

Hierarchical Pattern

A supervisor agent breaks a complex task into subtasks, delegates them to worker agents, collects results, and synthesizes a final output.

coordination:
pattern: hierarchical
supervisor: executive-analyst
workers:
- name: sales-analyst
task: "Analyze Q3 sales performance"
- name: marketing-analyst
task: "Analyze Q3 campaign effectiveness"
- name: ops-analyst
task: "Analyze Q3 operational efficiency"
aggregation:
strategy: supervisor-synthesis
timeout_seconds: 300

Agent Versioning

Every agent in Datafi is versioned using semantic versioning (major.minor.patch). Versioning lets you evolve agents safely without disrupting running workflows.

ActionBehavior
Publish a new versionThe previous version remains available. Existing workflows continue using the version they reference.
Pin a versionWorkflows and triggers reference a specific version (e.g., [email protected]).
Use latestReference revenue-analyst@latest to always use the most recently published version.
Deprecate a versionMark a version as deprecated. Existing references still work, but new workflows cannot select it.
RollbackRevert to a previous version by updating the workflow reference.
agents:
- name: revenue-analyst
version: "1.2.0" # pinned version
- name: cost-analyzer
version: "latest" # always uses newest

A/B Testing

Datafi supports A/B testing for agents, allowing you to compare two versions of an agent side by side in production. Traffic is split between versions based on a configurable ratio, and results are tracked independently for each version.

Setting Up an A/B Test

  1. Navigate to AI > Agent Catalog and select the agent you want to test.
  2. Click A/B Test and select the two versions to compare.
  3. Configure the traffic split (e.g., 80/20, 50/50).
  4. Define comparison metrics (success rate, execution time, token usage, output quality).
  5. Set a test duration or sample size threshold.
  6. Start the test.

Monitoring A/B Results

MetricVersion AVersion B
Success ratePercentage of successful runs.Percentage of successful runs.
Average durationMean execution time.Mean execution time.
Token usageAverage tokens per run.Average tokens per run.
Quality scoreBased on user feedback (thumbs up/down).Based on user feedback (thumbs up/down).
ab_test:
agent: revenue-analyst
versions:
a: "1.2.0"
b: "1.3.0-beta"
traffic_split:
a: 80
b: 20
metrics:
- success_rate
- avg_duration_ms
- avg_token_usage
- quality_score
duration_days: 14
Statistical Significance

The A/B testing dashboard indicates when results reach statistical significance. Avoid drawing conclusions from small sample sizes -- wait until the dashboard confirms confidence before promoting a version.


Lifecycle Triggers

Agents and multi-agent workflows can be triggered automatically through multiple mechanisms.

Trigger TypeConfigurationDescription
ManualNone (on-demand)Triggered by a user from the catalog, API, or workflow.
Pollinginterval, backoffChecks a condition at regular intervals. Supports exponential backoff to reduce load when the condition is not met.
Schedulecron, timezoneRuns on a cron schedule in the specified timezone.
Eventtype, filterFires when a matching platform event occurs. Supports filter expressions.
Webhookpath, authFires when an HTTP request hits the configured webhook path. Supports API key and JWT authentication.
triggers:
- type: schedule
cron: "0 8 * * MON"
timezone: "America/New_York"

- type: event
event_type: data.loaded
filter: "source == 'sales_warehouse' && row_count > 0"

- type: webhook
path: /hooks/revenue-report
auth:
type: api_key
header: X-API-Key

- type: polling
interval_seconds: 300
backoff:
type: exponential
max_interval_seconds: 3600
condition: "SELECT COUNT(*) FROM staging.pending WHERE status = 'ready'"

Observability

Multi-agent coordination provides additional observability beyond single-agent monitoring:

  • Coordination trace -- Visualize the full message flow between agents, including event publications, message handoffs, and state reads/writes.
  • Agent dependency graph -- See which agents depend on which, based on coordination patterns.
  • Bottleneck detection -- Identify agents that slow down the overall workflow due to long execution times or frequent retries.
  • State inspector -- View the contents of the shared state store at any point during execution.

Design Considerations

When designing multi-agent systems, keep these principles in mind:

  1. Prefer loose coupling -- Use event-driven or shared state patterns when agents do not need direct interaction. This makes it easier to add, remove, or replace agents without disrupting the system.
  2. Set clear boundaries -- Each agent should have a single, well-defined responsibility. Avoid creating "super agents" that do everything.
  3. Use guard rails consistently -- Apply resource limits and PII filtering to every agent, not just the entry point. A single unconstrained agent can compromise the entire workflow.
  4. Version deliberately -- Pin versions in production workflows. Use latest only in development and testing environments.
  5. Test coordination -- Test multi-agent workflows end-to-end, not just individual agents. Message schemas, state keys, and event filters can introduce subtle failures that only appear during coordination.

Next Steps