Skip to main content

Cloud Warehouses

Datafi integrates with the major cloud data warehouses so you can federate queries across platforms without moving data. This page covers the setup and configuration for BigQuery, Databricks, Redshift, and Azure Synapse Analytics.


Google BigQuery

Authentication

Datafi connects to BigQuery using a Google Cloud service account. The service account must have the appropriate IAM roles to read data from your project and datasets.

ParameterDescriptionRequired
Service Account KeyA JSON key file downloaded from the Google Cloud Console.Yes
Project IDThe Google Cloud project containing your BigQuery datasets.Yes
Default DatasetThe dataset used as the default context for unqualified table references.No

Required IAM roles:

RolePurpose
roles/bigquery.dataViewerRead access to tables and views.
roles/bigquery.jobUserPermission to execute query jobs.
roles/bigquery.metadataViewerAccess to dataset and table metadata for schema introspection.

Setup Steps

  1. Create a service account in Google Cloud Console with the roles listed above.
  2. Download the JSON key file.
  3. In Datafi, navigate to Integrations > Add Data Source > BigQuery.
  4. Upload the service account key file and enter the project ID.
  5. Click Test Connection, then Save.
tip

If you manage multiple Google Cloud projects, create a separate Datafi data source for each project. This keeps IAM permissions scoped and makes it easier to track usage per project.


Databricks

Authentication

Datafi connects to Databricks using a personal access token or OAuth (M2M) credential. The connection targets your Databricks SQL Warehouse or cluster endpoint.

ParameterDescriptionRequired
HostThe Databricks workspace URL (e.g., adb-1234567890.12.azuredatabricks.net).Yes
HTTP PathThe HTTP path to your SQL Warehouse or cluster (e.g., /sql/1.0/warehouses/abc123).Yes
Access TokenA personal access token or service principal token.Yes
CatalogThe Unity Catalog name for catalog-level access control.No
SchemaThe default schema within the selected catalog.No

Delta Lake and Unity Catalog

Datafi supports querying Delta Lake tables through Databricks. When you connect to a Databricks workspace with Unity Catalog enabled, Datafi introspects the three-level namespace: catalog.schema.table.

FeatureSupport
Delta Lake readsFully supported through Databricks SQL.
Time travel queriesSupported via VERSION AS OF and TIMESTAMP AS OF syntax.
Unity CatalogThree-level namespace browsing and policy enforcement.
Photon accelerationTransparent -- queries benefit from Photon if enabled on the warehouse.

Setup Steps

  1. Generate a personal access token in your Databricks workspace settings.
  2. Note the SQL Warehouse HTTP path from the warehouse connection details.
  3. In Datafi, navigate to Integrations > Add Data Source > Databricks.
  4. Enter the host, HTTP path, and access token.
  5. Optionally specify a catalog and schema.
  6. Click Test Connection, then Save.

Amazon Redshift

Authentication

Datafi connects to Redshift using IAM authentication or database credentials (username/password). IAM is recommended for production workloads.

ParameterDescriptionRequired
Cluster EndpointThe Redshift cluster endpoint (e.g., my-cluster.abc123.us-east-1.redshift.amazonaws.com).Yes
PortThe cluster port (default: 5439).Yes
DatabaseThe database name.Yes
UsernameThe database user or IAM user.Yes
PasswordThe database password. Required for password-based auth.Conditional
IAM Role ARNThe IAM role ARN for IAM-based auth.Conditional
AWS RegionThe AWS region where the cluster is deployed.Yes

IAM Authentication Setup

  1. Create an IAM policy that grants redshift:GetClusterCredentials and redshift:DescribeClusters.
  2. Attach the policy to the IAM role or user that Datafi will assume.
  3. In Datafi, select IAM as the authentication method and enter the IAM role ARN.
note

IAM authentication generates temporary credentials that are automatically rotated. This eliminates the need to manage long-lived database passwords.

Redshift Serverless

For Redshift Serverless, use the workgroup endpoint in place of the cluster endpoint. All other parameters remain the same.


Azure Synapse Analytics

Authentication

Datafi connects to Azure Synapse using Azure Active Directory (Azure AD) authentication or SQL authentication.

ParameterDescriptionRequired
ServerThe Synapse SQL endpoint (e.g., my-workspace.sql.azuresynapse.net).Yes
PortThe connection port (default: 1433).Yes
DatabaseThe dedicated SQL pool or serverless pool name.Yes
Auth MethodAzure AD or SQL Authentication.Yes
Client IDAzure AD application (client) ID.Yes (Azure AD)
Client SecretAzure AD application secret.Yes (Azure AD)
Tenant IDAzure AD tenant ID.Yes (Azure AD)
UsernameSQL authentication username.Yes (SQL Auth)
PasswordSQL authentication password.Yes (SQL Auth)

Setup Steps

  1. Register an application in Azure AD with permissions to access the Synapse workspace.
  2. Grant the application the db_datareader role on the target database.
  3. In Datafi, navigate to Integrations > Add Data Source > Synapse.
  4. Enter the server endpoint, database, and Azure AD credentials.
  5. Click Test Connection, then Save.
tip

Azure AD authentication is recommended for Synapse because it integrates with your organization's conditional access policies, MFA requirements, and centralized identity management.


Comparison

FeatureBigQueryDatabricksRedshiftSynapse
Auth MethodsService accountToken / OAuthIAM / PasswordAzure AD / SQL
Schema IntrospectionProject > Dataset > TableCatalog > Schema > TableDatabase > Schema > TableDatabase > Schema > Table
Concurrent Query LimitPer-project quotaPer-warehouse limitWLM queue slotsResource class
Federated with DatafiYesYesYesYes

Next Steps