Skip to main content

AI Infrastructure Overview

Datafi embeds AI capabilities directly into the data platform, giving you natural language querying, document intelligence, speech-to-text input, and autonomous agents -- all governed by the same access control policies that protect your data.

Rather than bolting on a separate AI layer, Datafi treats AI as a first-class component of the query and analysis pipeline. Every AI interaction passes through the Coordinator, which enforces ABAC policies, routes requests to the appropriate LLM provider, and ensures that generated queries only access data you are authorized to see.


LLM Providers

Datafi supports multiple LLM providers, configurable at the tenant level. You can select different providers for different workloads or switch providers without changing application code.

ProviderModelsUse CasesConfiguration
OpenAIGPT-4, GPT-4 VisionNL-to-SQL, document extraction, general reasoningAPI key per tenant
Azure OpenAIGPT-4, GPT-4 Vision (Azure-hosted)Enterprises requiring Azure compliance and data residencyAzure endpoint + API key
AWS BedrockClaude, TitanOrganizations standardized on AWS infrastructureAWS IAM role or access key
Snowflake CortexCortex LLM functionsTeams with data already in Snowflake who want in-platform inferenceSnowflake connection credentials
Provider Selection

You configure your preferred LLM provider in Administration > AI Settings. Each tenant can have a different provider, and you can change providers at any time without modifying your agents or workflows.


Core Capabilities

Datafi's AI infrastructure is organized into four capability areas, each covered in detail in subsequent pages.

Natural Language to SQL

Ask questions in plain English and receive query results. The NL-to-SQL pipeline assembles context from your schema, example queries, tenant customizations, and conversation history, then generates PRQL that compiles to database-specific SQL. Every generated query is validated against your access control policies before execution.

Key features:

  • Multi-turn conversation threads with context retention
  • Schema-aware context assembly for accurate query generation
  • PRQL intermediate representation for cross-database compatibility
  • Policy-enforced validation before any query reaches your data

See Chat Interface for the full pipeline walkthrough.

Document Intelligence

Extract structured data from PDFs, images, and scanned documents. Datafi uses GPT-4 Vision to process visual content and return structured JSON that you can store, query, or feed into downstream workflows.

Supported formats: TIFF, PNG, BMP, JPG, PDF

See Document Intelligence for details.

Autonomous Agents

Build and deploy agents that perform multi-step data tasks autonomously. Agents combine LLM reasoning with tool execution -- querying databases, calling APIs, processing documents, and generating reports -- all within configurable guard rails.

Key features:

  • Declarative agent specification (identity, capabilities, behavior, guard rails)
  • 13+ built-in tools (query, search, vision, web, HTTP, email, and more)
  • Configurable reasoning strategies (step-by-step, parallel exploration, hypothesis-driven)
  • Resource limits and PII filtering for safe autonomous operation

See Agent Catalog, Agent Builder, and Multi-Agent Coordination.

Workflow Engine

Orchestrate complex data pipelines using a graph-based workflow builder. Define nodes for actions, conditions, loops, parallel execution, and human-in-the-loop approvals. Workflows integrate directly with agents and the query engine.

See Workflow Builder for the complete reference.


How AI Fits into the Platform

Every AI interaction flows through the Coordinator and the ABAC engine. Whether you type a question, upload a document, or trigger an agent, the platform enforces your governance policies at every step.


Security and Governance

AI capabilities inherit the full security model of the Datafi platform:

  • Access control -- Generated queries are validated against ABAC policies before execution. An AI-generated query cannot access data that the requesting user is not authorized to see.
  • Tenant isolation -- LLM provider credentials and configuration are scoped to each tenant. One tenant's AI traffic never crosses into another tenant's provider.
  • PII filtering -- Agents can be configured with PII filtering guard rails that scrub sensitive data before it reaches an LLM.
  • Audit logging -- Every AI interaction, including the generated query, the LLM provider used, and the validation result, is recorded in the audit log.

Next Steps