Connecting Datasets

Connecting a data source to Datafi gives your organization federated, governed access to that data without moving or copying it. The connection workflow walks you through seven steps, from selecting a connector to configuring user access rules.

Before You Begin

Make sure you have the following ready:

Database credentials -- A username and password (or other authentication method) with read access to the target database.
Network access -- An Edge Server deployed in the same network as your data source, or a cloud-hosted Edge Server with connectivity to it.
Permissions -- You must have the Data Owner role in Datafi to connect a new dataset.

Choosing the Right Edge Server

The Edge Server you select determines the network path between Datafi and your data source. Always choose the Edge Server that is closest to your database -- ideally one deployed in the same VPC, subnet, or on-premises network. Selecting a distant Edge Server increases query latency and can cause connection timeouts.

The 7-Step Connection Process

Step 1: Select a Connector

Open the Data Catalog and click Add Dataset. You are presented with the full list of supported connectors. Choose the one that matches your data source:

Relational databases -- MySQL, PostgreSQL, Aurora, MariaDB, MSSQL, Oracle
Data warehouses -- Snowflake, Databricks, BigQuery, Synapse
SaaS platforms -- Salesforce, Dynamics 365, NetSuite

Click the connector tile to proceed.

Step 2: Provide a Dataset Name

Enter a Dataset Name that your team will use to identify this source inside Datafi. Choose something descriptive and unique within your workspace -- for example, production-orders-db or salesforce-west-region.

Naming Conventions

The dataset name is how users discover this source in the catalog. Use a consistent naming pattern across your organization so that datasets are easy to find and distinguish.

Step 3: Select an Edge Server

Choose the Edge Server that will broker the connection between Datafi and your database. The dropdown lists every Edge Server registered in your workspace along with its status and region.

Step 4: Configure Connection Credentials

Fill in the credential fields required by your chosen connector. The exact fields vary by connector type, but common fields include:

Field	Description
Database Name	The name of the specific database or catalog you want to connect to.
Username	The database user account Datafi will use to authenticate.
Auth Type	The authentication method -- password, key pair, OAuth, or token, depending on the connector.
Account	The account identifier, used primarily by cloud warehouses such as Snowflake.
Warehouse	The compute warehouse to use for queries (Snowflake and similar platforms).
Schema	The default schema to expose. You can adjust schema visibility later.
Role	The database role assumed during queries (Snowflake and similar platforms).

Least-Privilege Access

Use a database account with the minimum permissions necessary. Datafi only needs read access for federated queries. Granting broader permissions increases your security surface without adding functionality.

Step 5: Test and Review the Connection

After you submit your credentials, Datafi attempts to connect to the data source through the selected Edge Server. If the connection succeeds, the platform retrieves the schema -- tables, columns, data types, and relationships -- and displays it for your review.

Review the schema carefully:

Confirm that the expected tables and columns are present.
Verify data types are detected correctly.
Note any tables you may want to exclude from user access later.

If the connection fails, check your credentials, network configuration, and Edge Server status, then try again.

Step 6: Set Up User Access

Define who can access this dataset and what they can see. You can:

Invite specific users by email address.
Assign users the Data Owner or Data User role for this dataset.
Restrict access to specific tables or columns.

You can also skip this step and configure access later from the dataset details page.

Step 7: Set Up Dataset Rules

Configure any dataset-level rules that govern how the data is accessed:

Row-level policies -- Restrict which rows a user or group can see based on column values.
Column-level policies -- Hide or mask sensitive columns for specific roles.
Query limits -- Set maximum row counts or execution time limits.

Once you save your rules, the dataset appears in the Data Catalog and is available to authorized users immediately.

After Connecting

Once the dataset is live, you can:

Browse its schema from the dataset details page.
Query it using the Datafi query editor or natural-language chat.
Share it with additional users or external partners.
Apply tags, descriptions, and regional metadata to improve discoverability.

For ongoing management tasks, see Managing Datasets.

Before You Begin​

The 7-Step Connection Process​

Step 1: Select a Connector​

Step 2: Provide a Dataset Name​

Step 3: Select an Edge Server​

Step 4: Configure Connection Credentials​

Step 5: Test and Review the Connection​

Step 6: Set Up User Access​

Step 7: Set Up Dataset Rules​

After Connecting​