Connecting Datasets
Connecting a data source to Datafi gives your organization federated, governed access to that data without moving or copying it. The connection workflow walks you through seven steps, from selecting a connector to configuring user access rules.
Before You Begin
Make sure you have the following ready:
- Database credentials -- A username and password (or other authentication method) with read access to the target database.
- Network access -- An Edge Server deployed in the same network as your data source, or a cloud-hosted Edge Server with connectivity to it.
- Permissions -- You must have the Data Owner role in Datafi to connect a new dataset.
The Edge Server you select determines the network path between Datafi and your data source. Always choose the Edge Server that is closest to your database -- ideally one deployed in the same VPC, subnet, or on-premises network. Selecting a distant Edge Server increases query latency and can cause connection timeouts.
The 7-Step Connection Process
Step 1: Select a Connector
Open the Data Catalog and click Add Dataset. You are presented with the full list of supported connectors. Choose the one that matches your data source:
- Relational databases -- MySQL, PostgreSQL, Aurora, MariaDB, MSSQL, Oracle
- Data warehouses -- Snowflake, Databricks, BigQuery, Synapse
- SaaS platforms -- Salesforce, Dynamics 365, NetSuite
Click the connector tile to proceed.
Step 2: Provide a Dataset Name
Enter a Dataset Name that your team will use to identify this source inside Datafi. Choose something descriptive and unique within your workspace -- for example, production-orders-db or salesforce-west-region.
The dataset name is how users discover this source in the catalog. Use a consistent naming pattern across your organization so that datasets are easy to find and distinguish.
Step 3: Select an Edge Server
Choose the Edge Server that will broker the connection between Datafi and your database. The dropdown lists every Edge Server registered in your workspace along with its status and region.
Step 4: Configure Connection Credentials
Fill in the credential fields required by your chosen connector. The exact fields vary by connector type, but common fields include:
| Field | Description |
|---|---|
| Database Name | The name of the specific database or catalog you want to connect to. |
| Username | The database user account Datafi will use to authenticate. |
| Auth Type | The authentication method -- password, key pair, OAuth, or token, depending on the connector. |
| Account | The account identifier, used primarily by cloud warehouses such as Snowflake. |
| Warehouse | The compute warehouse to use for queries (Snowflake and similar platforms). |
| Schema | The default schema to expose. You can adjust schema visibility later. |
| Role | The database role assumed during queries (Snowflake and similar platforms). |
Use a database account with the minimum permissions necessary. Datafi only needs read access for federated queries. Granting broader permissions increases your security surface without adding functionality.
Step 5: Test and Review the Connection
After you submit your credentials, Datafi attempts to connect to the data source through the selected Edge Server. If the connection succeeds, the platform retrieves the schema -- tables, columns, data types, and relationships -- and displays it for your review.
Review the schema carefully:
- Confirm that the expected tables and columns are present.
- Verify data types are detected correctly.
- Note any tables you may want to exclude from user access later.
If the connection fails, check your credentials, network configuration, and Edge Server status, then try again.
Step 6: Set Up User Access
Define who can access this dataset and what they can see. You can:
- Invite specific users by email address.
- Assign users the Data Owner or Data User role for this dataset.
- Restrict access to specific tables or columns.
You can also skip this step and configure access later from the dataset details page.
Step 7: Set Up Dataset Rules
Configure any dataset-level rules that govern how the data is accessed:
- Row-level policies -- Restrict which rows a user or group can see based on column values.
- Column-level policies -- Hide or mask sensitive columns for specific roles.
- Query limits -- Set maximum row counts or execution time limits.
Once you save your rules, the dataset appears in the Data Catalog and is available to authorized users immediately.
After Connecting
Once the dataset is live, you can:
- Browse its schema from the dataset details page.
- Query it using the Datafi query editor or natural-language chat.
- Share it with additional users or external partners.
- Apply tags, descriptions, and regional metadata to improve discoverability.
For ongoing management tasks, see Managing Datasets.