Snowflake Integration
Datafi connects to Snowflake through a native driver that supports both JWT and OAuth authentication. Once connected, you can query Snowflake data alongside other sources using Datafi's federated query engine. Queries authored in PRQL are compiled to Snowflake-compatible SQL before execution.
Authentication Methods
Datafi supports two authentication methods for Snowflake. Choose the one that aligns with your organization's security requirements.
JWT (Key Pair Authentication)
JWT authentication uses an RSA key pair. Snowflake validates the JWT token signed with your private key against the public key registered to your Snowflake user.
Setup steps:
-
Generate an RSA key pair (2048-bit minimum):
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub -
Register the public key with your Snowflake user:
ALTER USER datafi_service_user SET RSA_PUBLIC_KEY='MIIBIjAN...'; -
In Datafi, navigate to Integrations > Add Data Source > Snowflake.
-
Select JWT as the authentication method.
-
Provide your Snowflake account identifier, username, and upload the private key file.
-
Click Test Connection, then Save.
OAuth
OAuth authentication delegates credential management to your identity provider. Datafi acts as an OAuth client and obtains access tokens through the authorization code flow.
Setup steps:
-
Register Datafi as an OAuth client in your Snowflake account:
CREATE SECURITY INTEGRATION datafi_oauth
TYPE = OAUTH
OAUTH_CLIENT = CUSTOM
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = 'https://home.datafi.us/oauth/callback'
ENABLED = TRUE; -
Retrieve the client ID and client secret from the integration.
-
In Datafi, select OAuth as the authentication method.
-
Enter the client ID, client secret, and your Snowflake account identifier.
-
Click Authorize to complete the OAuth flow.
OAuth is recommended for organizations that require centralized credential rotation and token lifecycle management through their identity provider.
Connection Parameters
| Parameter | Description | Required |
|---|---|---|
| Account Identifier | Your Snowflake account locator (e.g., xy12345.us-east-1). | Yes |
| Username | The Snowflake user account Datafi uses to connect. | Yes (JWT) |
| Private Key | The PKCS#8 private key file for JWT signing. | Yes (JWT) |
| Client ID | OAuth client identifier from the security integration. | Yes (OAuth) |
| Client Secret | OAuth client secret. | Yes (OAuth) |
| Database | The default database context for queries. | No |
| Schema | The default schema within the selected database. | No |
| Role | The Snowflake role assumed for all sessions. | No |
Warehouse Selection
Datafi allows you to select the Snowflake virtual warehouse used for query execution. You configure this at the data source level.
| Warehouse Setting | Description |
|---|---|
| Default Warehouse | The warehouse used for all queries unless overridden. |
| Auto-Suspend | Datafi respects your warehouse auto-suspend settings. No additional configuration is needed. |
| Warehouse Size | Choose the size that matches your expected workload (X-Small through 6X-Large). |
If the specified warehouse is suspended when a query arrives, Snowflake automatically resumes it. Resume time depends on the warehouse size and typically takes a few seconds.
Schema Introspection
When you connect a Snowflake source, Datafi performs automatic schema introspection to catalog all accessible databases, schemas, tables, views, and columns. This metadata powers the Data Catalog, autocomplete in the query editor, and policy enforcement.
Introspection runs:
- On initial connection -- A full scan of all accessible objects.
- On schedule -- Periodic refresh (configurable, default every 6 hours).
- On demand -- You can trigger a manual refresh from the data source settings.
PRQL Compilation to Snowflake SQL
Datafi uses PRQL (Pipelined Relational Query Language) as its intermediate query representation. When you write a query in Datafi -- whether through SQL, natural language, or a Data View -- the Coordinator compiles it to PRQL and then to Snowflake-compatible SQL.
Example PRQL to Snowflake SQL:
from orders
filter order_date >= @2024-01-01
derive total = quantity * unit_price
sort {-total}
take 100
Compiles to:
SELECT *, quantity * unit_price AS total
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY total DESC
LIMIT 100;
The PRQL compiler handles Snowflake-specific syntax -- including QUALIFY, FLATTEN, semi-structured data access, and Snowflake SQL functions -- automatically.
Troubleshooting
| Issue | Possible Cause | Resolution |
|---|---|---|
JWT token is invalid | Public key mismatch or expired token. | Re-register the public key with the Snowflake user. |
Warehouse is not accessible | The Datafi role lacks USAGE privilege on the warehouse. | Grant USAGE on the warehouse to the configured role. |
Schema introspection timeout | Large number of objects or network latency. | Narrow the default database/schema or increase the introspection timeout. |
Next Steps
- PostgreSQL and MySQL -- Connect relational databases.
- Cloud Warehouses -- Set up BigQuery, Databricks, Redshift, or Synapse.