Supported Data Sources
Datafi connects to a broad range of relational databases, cloud data warehouses, cloud storage services, and third-party platforms. This page provides the complete compatibility matrix.
Relational Databases and Data Warehouses
| Data Source | Authentication Methods | Driver Type | Key Features |
|---|---|---|---|
| Snowflake | JWT, OAuth | ODBC | Multi-cluster warehouses, time travel queries, semi-structured data (VARIANT). |
| PostgreSQL | Password, SSL client certificates | sqlx (native) | Full SQL support, JSONB columns, materialized views, extensions. |
| Microsoft SQL Server | Windows Authentication, SQL Authentication | ODBC | T-SQL dialect, linked servers, temporal tables. |
| MySQL | Password | Native driver | InnoDB and MyISAM support, full-text search, replication-aware connections. |
| MariaDB | Password | Native driver | MySQL-compatible with additional storage engines and features. |
| Amazon Redshift | IAM roles, Password | JDBC | Columnar storage, Spectrum for S3 queries, concurrency scaling. |
| Google BigQuery | Service Account (JSON key) | REST API | Serverless, petabyte-scale analytics, Standard SQL dialect. |
| Databricks | Personal Access Token | Spark SQL | Unity Catalog integration, Delta Lake support, SQL warehouses. |
| Azure Synapse Analytics | Azure Active Directory, SQL Authentication | ODBC | Dedicated and serverless SQL pools, PolyBase integration. |
| Oracle Database | TNS (Transparent Network Substrate) | OCI (Oracle Call Interface) | PL/SQL support, partitioning, RAC-aware connections. |
| IBM DB2 | Password | ODBC | z/OS and LUW support, federated queries, workload management. |
Datafi selects the highest-performance driver available for each data source. Native drivers (such as sqlx for PostgreSQL) are preferred over ODBC when they provide better performance or feature coverage.
Cloud Storage
You can connect to flat files and semi-structured data stored in cloud object storage. Datafi reads these files and presents them as queryable datasets.
| Storage Provider | Supported Formats | Authentication |
|---|---|---|
| Amazon S3 | CSV, JSON, Parquet | IAM Role, Access Key + Secret Key |
| Azure Blob Storage | CSV, JSON, Parquet | Shared Access Signature (SAS), Azure AD |
| Google Cloud Storage | CSV, JSON, Parquet | Service Account (JSON key) |
How it works:
- You configure a cloud storage connector with the bucket or container path and authentication credentials.
- Datafi scans the specified path and discovers available files.
- Files are registered as datasets with inferred schemas.
- Queries against these datasets are executed by the Edge node, which reads the files on demand.
Cloud storage connectors are best suited for analytical workloads on static or slowly changing data. For high-frequency transactional access, use a dedicated database connector.
File-Based Sources
| Format | Description | Schema Detection |
|---|---|---|
| CSV | Comma-separated values. Configurable delimiters, headers, and encoding. | Automatic type inference from sample rows. |
| JSON | JSON arrays or newline-delimited JSON (NDJSON). | Automatic schema inference from document structure. |
Third-Party Platform Connectors
Datafi provides connectors for third-party business platforms, allowing you to query operational data alongside your analytical databases.
| Platform | Authentication | Data Access |
|---|---|---|
| Salesforce | OAuth 2.0 | Objects, reports, and SOQL queries. |
| Microsoft Dynamics 365 | Azure AD / OAuth 2.0 | Entities and custom tables via Dataverse API. |
| NetSuite | Token-Based Authentication (TBA) | SuiteQL queries, saved searches, records. |
Third-party connectors expose platform data as standard datasets. You can apply the same policies, build the same data views, and use the same query interface as you would with any relational database.
Planned Connectors
The following connectors are on the roadmap. Availability dates are subject to change.
| Data Source | Status | Expected Driver |
|---|---|---|
| Shopify | In development | REST / GraphQL API |
| HubSpot | Planned | REST API |
| SAP HANA | Planned | ODBC |
| Teradata | Planned | ODBC |
| Elasticsearch | Planned | REST API |
| MongoDB | Planned | Native driver |
| Cassandra | Planned | Native driver |
| ClickHouse | Planned | Native driver |
Connector Configuration Summary
Every connector requires the following baseline configuration:
| Parameter | Description | Required |
|---|---|---|
| Name | A human-readable identifier for the connector. | Yes |
| Type | The data source type (e.g., postgresql, snowflake, s3). | Yes |
| Edge Server | The Edge node that will host this connection. | Yes |
| Authentication | Credentials specific to the data source type. | Yes |
| Host / Endpoint | The database hostname, IP address, or API endpoint. | Yes (except cloud storage) |
| Port | The database port. Defaults are applied per data source type. | No |
| Database / Schema | The default database and schema to connect to. | Varies by type |
| Connection Pool Size | Maximum number of concurrent connections to the data source. | No (default: 10) |
| Timeout | Connection and query timeout in seconds. | No (default: 300s) |
Authentication Methods Reference
| Method | Data Sources | Description |
|---|---|---|
| Password | PostgreSQL, MySQL, MariaDB, MSSQL, DB2, Redshift | Standard username and password authentication. |
| JWT | Snowflake | JSON Web Token for service-to-service authentication. |
| OAuth 2.0 | Snowflake, Salesforce, Dynamics | Delegated authorization using OAuth flows. |
| SSL Client Certificates | PostgreSQL | Mutual TLS using client certificate and key. |
| Windows Authentication | MSSQL, Synapse | Integrated Windows / Kerberos authentication. |
| Azure Active Directory | Synapse, Dynamics | Azure AD tokens for Microsoft services. |
| IAM Roles | Redshift, S3 | AWS Identity and Access Management. |
| Service Account | BigQuery, GCS | Google Cloud service account JSON key file. |
| Personal Access Token | Databricks | Token-based authentication for Databricks workspaces. |
| TNS | Oracle | Oracle Net Services connection descriptor. |
| Token-Based (TBA) | NetSuite | NetSuite Token-Based Authentication. |
| SAS Token | Azure Blob | Shared Access Signature for Azure Storage. |
| Access Key | S3 | AWS access key ID and secret access key pair. |
Next Steps
- Multi-Protocol APIs -- Learn how to interact with your connected data sources through the available API protocols.
- Request Lifecycle -- Understand how queries flow through the platform to your data sources.