Skip to main content

AI/ML Configuration

Datafi's AI capabilities -- including natural-language Chat, autonomous Agents, and Insights -- rely on large language model (LLM) providers and supporting infrastructure that you configure at the tenant level. Navigate to Administration > AI/ML to manage these settings.


LLM Provider Selection

Each tenant can connect to one or more LLM providers. You select the active provider per feature (Chat, Agents, Insights) so that different workloads can use different models based on cost, latency, or compliance requirements.

ProviderAuthenticationSupported ModelsNotes
OpenAIAPI keyGPT-4o, GPT-4o mini, o1, o3-miniDirect API access. Best for teams already using OpenAI.
Azure OpenAIAzure AD / API keyGPT-4o, GPT-4o mini (deployed endpoints)Data stays within your Azure tenant. Required for some compliance scenarios.
Amazon BedrockIAM role / access keyClaude (Anthropic), Titan, LlamaNo model hosting required. Access models through your AWS account.
Snowflake CortexSnowflake session tokenCortex LLM functionsRuns within your Snowflake environment. Data never leaves Snowflake.

Configuring a Provider

  1. Navigate to Administration > AI/ML > Providers.
  2. Click Add Provider.
  3. Select the provider type from the dropdown.
  4. Enter the required credentials (see the table above).
  5. Click Test Connection to verify access.
  6. Click Save.

Once a provider is configured, you assign it to specific AI features in the Feature Mapping section.

tip

If your organization requires that data never leave a specific cloud boundary, choose Azure OpenAI (for Azure environments), Amazon Bedrock (for AWS environments), or Snowflake Cortex (for Snowflake-native processing).

Feature Mapping

FeatureDescriptionRecommended Provider
ChatNatural-language questions about your data.Any provider with a conversational model.
AgentsAutonomous workflows that monitor and act on data.Providers with function-calling support.
InsightsAutomated trend detection, anomalies, and summaries.High-throughput providers for batch processing.

Vector Database Configuration

Datafi uses a vector database to store embeddings for semantic search, schema matching, and retrieval-augmented generation (RAG). Currently, Redis (with the RediSearch module) is the supported vector store.

ParameterDescriptionDefault
HostThe hostname or IP address of your Redis instance.localhost
PortThe port on which Redis is listening.6379
PasswordOptional authentication password.--
TLS EnabledWhether the connection uses TLS encryption.true
Index PrefixA namespace prefix applied to all vector indices created by Datafi.datafi:
Embedding DimensionsThe dimensionality of the embedding vectors. Must match your chosen embedding model.1536

To configure the vector database:

  1. Navigate to Administration > AI/ML > Vector Database.
  2. Enter your Redis connection details.
  3. Click Test Connection to confirm connectivity and module availability.
  4. Click Save.
warning

Changing the Embedding Dimensions value after embeddings have been generated requires a full re-indexing of all stored vectors. Plan this operation during a maintenance window.


Model Parameters

You can fine-tune how Datafi interacts with LLMs by adjusting model parameters on a per-feature basis.

ParameterRangeDefaultDescription
Temperature0.0 -- 2.00.1Controls response randomness. Lower values produce more deterministic outputs.
Max Tokens1 -- 1280004096The maximum number of tokens in a single model response.
Top P0.0 -- 1.00.9Nucleus sampling threshold. Lower values narrow the token selection pool.
Frequency Penalty-2.0 -- 2.00.0Reduces repetition of tokens that have already appeared.
System PromptFree textPlatform defaultThe system-level instruction prepended to every request.

To adjust model parameters:

  1. Navigate to Administration > AI/ML > Model Parameters.
  2. Select the feature (Chat, Agents, or Insights) you want to configure.
  3. Modify the parameter values.
  4. Click Save.
note

Parameter changes apply to new requests only. In-flight conversations retain the parameters that were active when the session started.


Next Steps