AI Infrastructure Overview

Datafi embeds AI capabilities directly into the data platform, giving you natural language querying, document intelligence, speech-to-text input, and autonomous agents -- all governed by the same access control policies that protect your data.

Rather than bolting on a separate AI layer, Datafi treats AI as a first-class component of the query and analysis pipeline. Every AI interaction passes through the Coordinator, which enforces ABAC policies, routes requests to the appropriate LLM provider, and ensures that generated queries only access data you are authorized to see.

LLM Providers

Datafi supports multiple LLM providers, configurable at the tenant level. You can select different providers for different workloads or switch providers without changing application code.

Provider	Models	Use Cases	Configuration
OpenAI	GPT-4, GPT-4 Vision	NL-to-SQL, document extraction, general reasoning	API key per tenant
Azure OpenAI	GPT-4, GPT-4 Vision (Azure-hosted)	Enterprises requiring Azure compliance and data residency	Azure endpoint + API key
AWS Bedrock	Claude, Titan	Organizations standardized on AWS infrastructure	AWS IAM role or access key
Snowflake Cortex	Cortex LLM functions	Teams with data already in Snowflake who want in-platform inference	Snowflake connection credentials

Provider Selection

You configure your preferred LLM provider in Administration > AI Settings. Each tenant can have a different provider, and you can change providers at any time without modifying your agents or workflows.

Core Capabilities

Datafi's AI infrastructure is organized into four capability areas, each covered in detail in subsequent pages.

Natural Language to SQL

Ask questions in plain English and receive query results. The NL-to-SQL pipeline assembles context from your schema, example queries, tenant customizations, and conversation history, then generates PRQL that compiles to database-specific SQL. Every generated query is validated against your access control policies before execution.

Key features:

Multi-turn conversation threads with context retention
Schema-aware context assembly for accurate query generation
PRQL intermediate representation for cross-database compatibility
Policy-enforced validation before any query reaches your data

See Chat Interface for the full pipeline walkthrough.

Document Intelligence

Extract structured data from PDFs, images, and scanned documents. Datafi uses GPT-4 Vision to process visual content and return structured JSON that you can store, query, or feed into downstream workflows.

Supported formats: TIFF, PNG, BMP, JPG, PDF

See Document Intelligence for details.

Autonomous Agents

Build and deploy agents that perform multi-step data tasks autonomously. Agents combine LLM reasoning with tool execution -- querying databases, calling APIs, processing documents, and generating reports -- all within configurable guard rails.

Key features:

Declarative agent specification (identity, capabilities, behavior, guard rails)
13+ built-in tools (query, search, vision, web, HTTP, email, and more)
Configurable reasoning strategies (step-by-step, parallel exploration, hypothesis-driven)
Resource limits and PII filtering for safe autonomous operation

See Agent Catalog, Agent Builder, and Multi-Agent Coordination.

Workflow Engine

Orchestrate complex data pipelines using a graph-based workflow builder. Define nodes for actions, conditions, loops, parallel execution, and human-in-the-loop approvals. Workflows integrate directly with agents and the query engine.

See Workflow Builder for the complete reference.

How AI Fits into the Platform

Every AI interaction flows through the Coordinator and the ABAC engine. Whether you type a question, upload a document, or trigger an agent, the platform enforces your governance policies at every step.

Security and Governance

AI capabilities inherit the full security model of the Datafi platform:

Access control -- Generated queries are validated against ABAC policies before execution. An AI-generated query cannot access data that the requesting user is not authorized to see.
Tenant isolation -- LLM provider credentials and configuration are scoped to each tenant. One tenant's AI traffic never crosses into another tenant's provider.
PII filtering -- Agents can be configured with PII filtering guard rails that scrub sensitive data before it reaches an LLM.
Audit logging -- Every AI interaction, including the generated query, the LLM provider used, and the validation result, is recorded in the audit log.

Next Steps

Chat Interface -- Learn how the NL-to-SQL pipeline works and how to use conversational querying.
Document Intelligence -- Extract structured data from PDFs and images.
Agent Catalog -- Browse and run pre-built agents.
Agent Builder -- Create custom agents with full control over behavior and guard rails.
Workflow Builder -- Orchestrate multi-step data pipelines.
Multi-Agent Coordination -- Coordinate multiple agents with event-driven patterns.

LLM Providers​

Core Capabilities​

Natural Language to SQL​

Document Intelligence​

Autonomous Agents​

Workflow Engine​

How AI Fits into the Platform​

Security and Governance​

Next Steps​