Cloud Warehouses
Datafi integrates with the major cloud data warehouses so you can federate queries across platforms without moving data. This page covers the setup and configuration for BigQuery, Databricks, Redshift, and Azure Synapse Analytics.
Google BigQuery
Authentication
Datafi connects to BigQuery using a Google Cloud service account. The service account must have the appropriate IAM roles to read data from your project and datasets.
| Parameter | Description | Required |
|---|---|---|
| Service Account Key | A JSON key file downloaded from the Google Cloud Console. | Yes |
| Project ID | The Google Cloud project containing your BigQuery datasets. | Yes |
| Default Dataset | The dataset used as the default context for unqualified table references. | No |
Required IAM roles:
| Role | Purpose |
|---|---|
roles/bigquery.dataViewer | Read access to tables and views. |
roles/bigquery.jobUser | Permission to execute query jobs. |
roles/bigquery.metadataViewer | Access to dataset and table metadata for schema introspection. |
Setup Steps
- Create a service account in Google Cloud Console with the roles listed above.
- Download the JSON key file.
- In Datafi, navigate to Integrations > Add Data Source > BigQuery.
- Upload the service account key file and enter the project ID.
- Click Test Connection, then Save.
If you manage multiple Google Cloud projects, create a separate Datafi data source for each project. This keeps IAM permissions scoped and makes it easier to track usage per project.
Databricks
Authentication
Datafi connects to Databricks using a personal access token or OAuth (M2M) credential. The connection targets your Databricks SQL Warehouse or cluster endpoint.
| Parameter | Description | Required |
|---|---|---|
| Host | The Databricks workspace URL (e.g., adb-1234567890.12.azuredatabricks.net). | Yes |
| HTTP Path | The HTTP path to your SQL Warehouse or cluster (e.g., /sql/1.0/warehouses/abc123). | Yes |
| Access Token | A personal access token or service principal token. | Yes |
| Catalog | The Unity Catalog name for catalog-level access control. | No |
| Schema | The default schema within the selected catalog. | No |
Delta Lake and Unity Catalog
Datafi supports querying Delta Lake tables through Databricks. When you connect to a Databricks workspace with Unity Catalog enabled, Datafi introspects the three-level namespace: catalog.schema.table.
| Feature | Support |
|---|---|
| Delta Lake reads | Fully supported through Databricks SQL. |
| Time travel queries | Supported via VERSION AS OF and TIMESTAMP AS OF syntax. |
| Unity Catalog | Three-level namespace browsing and policy enforcement. |
| Photon acceleration | Transparent -- queries benefit from Photon if enabled on the warehouse. |
Setup Steps
- Generate a personal access token in your Databricks workspace settings.
- Note the SQL Warehouse HTTP path from the warehouse connection details.
- In Datafi, navigate to Integrations > Add Data Source > Databricks.
- Enter the host, HTTP path, and access token.
- Optionally specify a catalog and schema.
- Click Test Connection, then Save.
Amazon Redshift
Authentication
Datafi connects to Redshift using IAM authentication or database credentials (username/password). IAM is recommended for production workloads.
| Parameter | Description | Required |
|---|---|---|
| Cluster Endpoint | The Redshift cluster endpoint (e.g., my-cluster.abc123.us-east-1.redshift.amazonaws.com). | Yes |
| Port | The cluster port (default: 5439). | Yes |
| Database | The database name. | Yes |
| Username | The database user or IAM user. | Yes |
| Password | The database password. Required for password-based auth. | Conditional |
| IAM Role ARN | The IAM role ARN for IAM-based auth. | Conditional |
| AWS Region | The AWS region where the cluster is deployed. | Yes |
IAM Authentication Setup
- Create an IAM policy that grants
redshift:GetClusterCredentialsandredshift:DescribeClusters. - Attach the policy to the IAM role or user that Datafi will assume.
- In Datafi, select IAM as the authentication method and enter the IAM role ARN.
IAM authentication generates temporary credentials that are automatically rotated. This eliminates the need to manage long-lived database passwords.
Redshift Serverless
For Redshift Serverless, use the workgroup endpoint in place of the cluster endpoint. All other parameters remain the same.
Azure Synapse Analytics
Authentication
Datafi connects to Azure Synapse using Azure Active Directory (Azure AD) authentication or SQL authentication.
| Parameter | Description | Required |
|---|---|---|
| Server | The Synapse SQL endpoint (e.g., my-workspace.sql.azuresynapse.net). | Yes |
| Port | The connection port (default: 1433). | Yes |
| Database | The dedicated SQL pool or serverless pool name. | Yes |
| Auth Method | Azure AD or SQL Authentication. | Yes |
| Client ID | Azure AD application (client) ID. | Yes (Azure AD) |
| Client Secret | Azure AD application secret. | Yes (Azure AD) |
| Tenant ID | Azure AD tenant ID. | Yes (Azure AD) |
| Username | SQL authentication username. | Yes (SQL Auth) |
| Password | SQL authentication password. | Yes (SQL Auth) |
Setup Steps
- Register an application in Azure AD with permissions to access the Synapse workspace.
- Grant the application the
db_datareaderrole on the target database. - In Datafi, navigate to Integrations > Add Data Source > Synapse.
- Enter the server endpoint, database, and Azure AD credentials.
- Click Test Connection, then Save.
Azure AD authentication is recommended for Synapse because it integrates with your organization's conditional access policies, MFA requirements, and centralized identity management.
Comparison
| Feature | BigQuery | Databricks | Redshift | Synapse |
|---|---|---|---|---|
| Auth Methods | Service account | Token / OAuth | IAM / Password | Azure AD / SQL |
| Schema Introspection | Project > Dataset > Table | Catalog > Schema > Table | Database > Schema > Table | Database > Schema > Table |
| Concurrent Query Limit | Per-project quota | Per-warehouse limit | WLM queue slots | Resource class |
| Federated with Datafi | Yes | Yes | Yes | Yes |
Next Steps
- Snowflake -- Connect to Snowflake with JWT or OAuth.
- PostgreSQL and MySQL -- Connect relational databases.