Skip to main content

Data Catalog Overview

The Data Catalog is the central hub of the Datafi platform. It is where you discover, connect, and manage every data source your organization works with -- databases, warehouses, SaaS platforms, and uploaded files -- all from a single, unified interface.

Quick Start

If you already have credentials for a data source, you can connect it in minutes. Jump to Connecting Datasets to get started right away.


What the Data Catalog Does

The Data Catalog serves three primary functions:

  1. Discovery -- Browse every data source that has been connected to your Datafi workspace. Search by name, tag, region, or data type to find exactly what you need.
  2. Connection -- Add new data sources using Datafi's library of connectors. The platform handles authentication, schema detection, and access configuration so you can focus on your data, not plumbing.
  3. Management -- View schema details, configure access rules, share datasets with teammates or external partners, and remove sources you no longer need.

Discovery Service

When you open the Data Catalog, you see a consolidated list of every dataset available to you. This includes data sources your organization has connected, files that have been uploaded, and public datasets provided by Datafi for exploration and evaluation.

You can filter and search the catalog to narrow results by:

  • Name -- Full or partial dataset name.
  • Tags -- Labels applied by data owners to categorize sources.
  • Region -- Geographic location of the data source or Edge Server.
  • Source type -- Database engine, file format, or SaaS application.
Public and Demo Data

Datafi provides public data sources and demo datasets so you can explore the platform before connecting your own data. See Public and Demo Data for details.


Connector Architecture

Datafi connects to your data sources through a lightweight component called an Edge Server. The Edge Server sits close to your data -- inside your VPC, on-premises network, or cloud environment -- and acts as a secure bridge between the Datafi platform and your databases.

This architecture means:

  • No data movement. Your data stays in its original location. Datafi federates queries to the source in real time.
  • No inbound firewall rules. The Edge Server initiates outbound connections to Datafi, so you do not need to open ports or expose databases to the internet.
  • Low latency. Because the Edge Server is co-located with your data, query execution happens close to the source.

Supported Connectors

Datafi supports a broad and growing set of data sources:

CategoryConnectors
Relational databasesMySQL, PostgreSQL, Aurora, MariaDB, MSSQL, Oracle
Data warehousesSnowflake, Databricks, BigQuery, Synapse
SaaS platformsSalesforce, Dynamics 365, NetSuite
File uploadsCSV, JSON
No Size Limits on Connected Sources

There is no size limit on data sources you connect through a connector. You pay based on query usage, not data volume. Uploaded files are subject to storage limits -- see Uploading Files for details.


Roles in the Data Catalog

Access to each dataset is governed by two roles:

  • Data Owner -- Has full administrative authorization over the dataset, including the ability to configure schema access, manage sharing, define policies, and delete the source.
  • Data User -- Has limited, read-oriented access as granted by the Data Owner.

For a detailed breakdown of role permissions and how to share data with others, see Sharing Data.


Next Steps