AI Infrastructure Overview
Datafi embeds AI capabilities directly into the data platform, giving you natural language querying, document intelligence, speech-to-text input, and autonomous agents -- all governed by the same access control policies that protect your data.
Rather than bolting on a separate AI layer, Datafi treats AI as a first-class component of the query and analysis pipeline. Every AI interaction passes through the Coordinator, which enforces ABAC policies, routes requests to the appropriate LLM provider, and ensures that generated queries only access data you are authorized to see.
LLM Providers
Datafi supports multiple LLM providers, configurable at the tenant level. You can select different providers for different workloads or switch providers without changing application code.
| Provider | Models | Use Cases | Configuration |
|---|---|---|---|
| OpenAI | GPT-4, GPT-4 Vision | NL-to-SQL, document extraction, general reasoning | API key per tenant |
| Azure OpenAI | GPT-4, GPT-4 Vision (Azure-hosted) | Enterprises requiring Azure compliance and data residency | Azure endpoint + API key |
| AWS Bedrock | Claude, Titan | Organizations standardized on AWS infrastructure | AWS IAM role or access key |
| Snowflake Cortex | Cortex LLM functions | Teams with data already in Snowflake who want in-platform inference | Snowflake connection credentials |
You configure your preferred LLM provider in Administration > AI Settings. Each tenant can have a different provider, and you can change providers at any time without modifying your agents or workflows.
Core Capabilities
Datafi's AI infrastructure is organized into four capability areas, each covered in detail in subsequent pages.
Natural Language to SQL
Ask questions in plain English and receive query results. The NL-to-SQL pipeline assembles context from your schema, example queries, tenant customizations, and conversation history, then generates PRQL that compiles to database-specific SQL. Every generated query is validated against your access control policies before execution.
Key features:
- Multi-turn conversation threads with context retention
- Schema-aware context assembly for accurate query generation
- PRQL intermediate representation for cross-database compatibility
- Policy-enforced validation before any query reaches your data
See Chat Interface for the full pipeline walkthrough.
Document Intelligence
Extract structured data from PDFs, images, and scanned documents. Datafi uses GPT-4 Vision to process visual content and return structured JSON that you can store, query, or feed into downstream workflows.
Supported formats: TIFF, PNG, BMP, JPG, PDF
See Document Intelligence for details.
Autonomous Agents
Build and deploy agents that perform multi-step data tasks autonomously. Agents combine LLM reasoning with tool execution -- querying databases, calling APIs, processing documents, and generating reports -- all within configurable guard rails.
Key features:
- Declarative agent specification (identity, capabilities, behavior, guard rails)
- 13+ built-in tools (query, search, vision, web, HTTP, email, and more)
- Configurable reasoning strategies (step-by-step, parallel exploration, hypothesis-driven)
- Resource limits and PII filtering for safe autonomous operation
See Agent Catalog, Agent Builder, and Multi-Agent Coordination.
Workflow Engine
Orchestrate complex data pipelines using a graph-based workflow builder. Define nodes for actions, conditions, loops, parallel execution, and human-in-the-loop approvals. Workflows integrate directly with agents and the query engine.
See Workflow Builder for the complete reference.
How AI Fits into the Platform
Every AI interaction flows through the Coordinator and the ABAC engine. Whether you type a question, upload a document, or trigger an agent, the platform enforces your governance policies at every step.
Security and Governance
AI capabilities inherit the full security model of the Datafi platform:
- Access control -- Generated queries are validated against ABAC policies before execution. An AI-generated query cannot access data that the requesting user is not authorized to see.
- Tenant isolation -- LLM provider credentials and configuration are scoped to each tenant. One tenant's AI traffic never crosses into another tenant's provider.
- PII filtering -- Agents can be configured with PII filtering guard rails that scrub sensitive data before it reaches an LLM.
- Audit logging -- Every AI interaction, including the generated query, the LLM provider used, and the validation result, is recorded in the audit log.
Next Steps
- Chat Interface -- Learn how the NL-to-SQL pipeline works and how to use conversational querying.
- Document Intelligence -- Extract structured data from PDFs and images.
- Agent Catalog -- Browse and run pre-built agents.
- Agent Builder -- Create custom agents with full control over behavior and guard rails.
- Workflow Builder -- Orchestrate multi-step data pipelines.
- Multi-Agent Coordination -- Coordinate multiple agents with event-driven patterns.