Agent Builder
The Agent Builder lets you create custom AI agents with precise control over what they do, how they reason, and what constraints they operate within. Every agent is defined by a declarative specification that covers four areas: Identity, Capabilities, Behavior, and Guard Rails.
Navigate to AI > Agent Builder to open the visual editor, or define agents programmatically using YAML specifications.
Agent Specification
Identity
Identity defines who the agent is and what it aims to accomplish.
| Field | Description | Example |
|---|---|---|
| Title | Display name shown in the UI. | Revenue Analyst |
| Name | Unique agent identifier, auto-generated from title. | revenue-analyst |
| Version | Semantic version string. | 1.2.0 |
| Description | Summary of the agent's purpose. | Analyzes quarterly revenue trends and generates executive summaries. |
| Author | Creator of the agent. | data-team |
| Tags | Labels for categorization and discovery. | ["analytics", "revenue", "reporting"] |
| Icon | Visual identifier -- choose from Material Symbols icons or upload a custom logo image. | Material icon: analytics |
| Goals | List of objectives the agent pursues. | ["Compute quarterly revenue by region", "Identify top growth segments"] |
| Success Criteria | Measurable conditions for a successful run. | ["Output contains revenue_by_region table", "Report generated in under 60s"] |
identity:
title: Revenue Analyst
name: revenue-analyst
version: 1.2.0
description: Analyzes quarterly revenue trends and generates executive summaries.
author: data-team
tags:
- analytics
- revenue
- reporting
icon:
type: material-symbol
value: analytics
goals:
- Compute quarterly revenue by region
- Identify top growth segments
- Generate executive summary report
success_criteria:
- Output contains revenue_by_region table
- Report generated in under 60 seconds
Capabilities
Capabilities define the tools, data sources, and output formats available to the agent.
Tools
| Tool | Description | Input | Output |
|---|---|---|---|
query | Execute PRQL or SQL queries against connected data sources. | PRQL/SQL string, datasource ID | Table results |
search | Semantic search across data catalog metadata. | Search query string | Matching tables, columns, descriptions |
llm | Call the tenant's configured LLM for reasoning or generation. | Prompt string, parameters | Text response |
vision | Analyze images using GPT-4 Vision. | Image URL or base64 | Text description, structured data |
web_search | Search the web for external information. | Search query | Search results with URLs and snippets |
web_fetch | Fetch and parse content from a URL. | URL | Page content as text or markdown |
vision_extraction | Extract structured data from documents using GPT-4 Vision. | Document file, extraction schema | Structured JSON |
http_api | Make HTTP requests to external APIs. | Method, URL, headers, body | HTTP response |
email | Send email notifications or retrieve email content. | Recipients, subject, body | Delivery status |
ftp | Upload or download files from FTP/SFTP servers. | Host, path, credentials | File content or transfer status |
csv | Parse or generate CSV files. | File path or data array | Parsed rows or file path |
json | Parse, transform, or generate JSON documents. | JSON string or object | Transformed JSON |
markdown | Generate formatted markdown documents. | Content structure | Markdown string |
markdown_table_formatter | Format data as HTML or markdown tables with column labels, sorting, and value formatting. | Data array, columns, format options | Formatted table string |
regression | Perform statistical linear regression analysis on grouped data. | Data array, x/y columns, confidence level | Slope, intercept, R², confidence intervals |
csv_writer | Generate CSV files from structured data. | Data array, column definitions | CSV file content |
csv_formatter | Format and transform CSV data with column mapping. | CSV data, formatting rules | Formatted CSV |
summarize | Summarize text content using LLM with configurable length. | Text content, target length | Summarized text |
array | Perform map, filter, reduce, and sort operations on arrays using JQ expressions. | Array data, operation, expression | Transformed array |
Data Sources
You specify which connected data sources the agent can access. The agent can only query data sources listed in its specification, and all queries are further restricted by ABAC policies.
Output Formats
Define the formats the agent produces: text, json, markdown, table, or any combination.
capabilities:
tools:
- query
- llm
- csv
- markdown
data_sources:
- datasource: sales_warehouse
permissions: read
- datasource: crm_database
permissions: read
output_formats:
- markdown
- table
Behavior
Behavior controls how the agent reasons, executes, and remembers context.
Execution Mode
| Mode | Description | Best For |
|---|---|---|
| Sequential | Steps execute one after another. Each step's output is available to the next. | Linear analysis pipelines, step-by-step reasoning. |
| Parallel | Independent steps execute simultaneously. | Data gathering from multiple sources, batch processing. |
| Hybrid | Combines sequential and parallel execution. Groups of parallel steps feed into sequential stages. | Complex workflows that benefit from both patterns. |
Reasoning Strategy
| Strategy | Description | Best For |
|---|---|---|
| Step-by-step | The agent reasons through the problem one step at a time, building on each conclusion. | Analytical tasks requiring logical progression. |
| Parallel exploration | The agent explores multiple approaches simultaneously and selects the best result. | Open-ended questions with multiple valid approaches. |
| Hypothesis-driven | The agent forms a hypothesis, tests it against data, and refines iteratively. | Investigative analysis, anomaly detection. |
| Depth-first | The agent pursues one line of inquiry deeply before considering alternatives. | Detailed root cause analysis. |
| Breadth-first | The agent surveys all options at a high level before diving into any single path. | Exploratory analysis, option comparison. |
Retry Policy
Configure how the agent handles failures:
behavior:
execution_mode: hybrid
reasoning_strategy: hypothesis-driven
retry_policy:
max_retries: 3
backoff: exponential
initial_delay_ms: 1000
memory:
type: hybrid
short_term:
max_turns: 20
long_term:
storage: tenant_memory_store
ttl_days: 30
Memory
| Memory Type | Description | Persistence |
|---|---|---|
| Short-term | Context from the current execution. Includes step outputs and intermediate results. | Current run only. |
| Long-term | Persisted knowledge from previous runs. The agent can recall past findings. | Configurable TTL. |
| Hybrid | Combines short-term and long-term. The agent uses current context and historical knowledge. | Both scopes. |
Guard Rails
Guard rails constrain the agent's behavior to keep it safe, efficient, and compliant.
Resource Limits
| Resource | Description | Default | Maximum |
|---|---|---|---|
| Tokens | Total LLM tokens (input + output) per run. | 50,000 | 500,000 |
| Memory | Working memory allocation per run. | 256 MB | 2 GB |
| CPU | CPU time limit per run. | 60 s | 600 s |
| API calls | Maximum external API calls per run. | 50 | 500 |
Constraints
| Constraint | Description |
|---|---|
| PII filtering | Scrub personally identifiable information before sending data to an LLM. Configurable patterns and entity types. |
| SQL injection prevention | All generated queries pass through parameterized validation. Agents cannot execute raw, unvalidated SQL. |
| Approval requirements | Require human approval before executing specific tools (e.g., email, http_api) or when token usage exceeds a threshold. |
| Data source restrictions | Limit the agent to specific data sources, schemas, or tables. |
| Output redaction | Automatically redact sensitive values in agent output. |
guard_rails:
resource_limits:
max_tokens: 100000
max_memory_mb: 512
max_cpu_seconds: 120
max_api_calls: 100
constraints:
pii_filtering:
enabled: true
entities: [email, phone, ssn, credit_card]
sql_injection_prevention: true
approval_required:
tools: [email, http_api]
token_threshold: 80000
Policy
Policy controls who can access and operate the agent. Each agent can define granular access control for four operations:
| Operation | Description |
|---|---|
| Read | View the agent specification and details. |
| Write | Edit and update the agent configuration. |
| Delete | Remove the agent from the catalog. |
| Run | Execute the agent manually or via triggers. |
policy:
read:
roles: [viewer, operator, builder, admin]
write:
roles: [builder, admin]
delete:
roles: [admin]
run:
roles: [operator, builder, admin]
LLM Configuration
Each agent can specify which LLM to use, overriding the tenant default from the LLM Registry.
llm:
provider: openai
model: gpt-4
temperature: 0.3
max_tokens: 4096
When no agent-level LLM is specified, the agent uses the tenant's default LLM for the text_generation capability.
Testing
The Agent Builder includes a built-in testing environment where you can validate your agent before publishing.
Test Runs
- Click Test in the builder toolbar.
- Provide sample input parameters.
- The agent executes in a sandboxed environment with full logging.
- Review each step: the reasoning, tool calls, intermediate results, and final output.
- Inspect token usage, execution time, and resource consumption.
Assertions
Define assertions that automatically validate test run output:
tests:
- name: revenue_report_generates
input:
quarter: Q3
year: 2025
assertions:
- output.format == "markdown"
- output.contains("Revenue by Region")
- execution_time_ms < 60000
Test Scenarios
Define complete test scenarios with mock data and evaluation metrics:
tests:
scenarios:
- name: high_revenue_quarter
description: Verify agent handles high-revenue data correctly
input:
quarter: Q3
year: 2025
expected_output:
format: markdown
contains: ["Revenue by Region"]
assertions:
- execution_time_ms < 60000
- output.tables.length > 0
- name: empty_dataset
description: Verify agent handles empty results gracefully
input:
quarter: Q1
year: 2020
assertions:
- output.contains("No data available")
evaluation_metrics:
- name: accuracy
threshold: 0.95
- name: latency_p99
threshold: 30000
Test runs execute against your real data sources (with your access policies applied). This ensures the agent behaves correctly with production schemas and data volumes, not just mock data.
Monitoring
After publishing an agent, monitor its health and performance from AI > Agent Monitoring.
| Metric | Description |
|---|---|
| Run count | Total runs over a selected time period. |
| Success rate | Percentage of runs that completed successfully. |
| Average duration | Mean execution time across runs. |
| Token usage | Average and peak token consumption per run. |
| Error rate | Percentage of runs that failed, grouped by error type. |
| Resource utilization | CPU, memory, and API call usage relative to configured limits. |
When an agent consistently approaches its resource limits (above 80% utilization), consider increasing the limits or optimizing the agent's reasoning strategy to reduce token consumption.
Next Steps
- Workflow Builder -- Compose agents into multi-step workflows.
- Multi-Agent Coordination -- Coordinate multiple agents with shared state and event-driven patterns.
- Agent Catalog -- Publish your agent to the catalog for others to discover and use.