Skip to main content

Supported Data Sources

Datafi connects to a broad range of relational databases, cloud data warehouses, cloud storage services, and third-party platforms. This page provides the complete compatibility matrix.

Relational Databases and Data Warehouses

Data SourceAuthentication MethodsDriver TypeKey Features
SnowflakeJWT, OAuthODBCMulti-cluster warehouses, time travel queries, semi-structured data (VARIANT).
PostgreSQLPassword, SSL client certificatessqlx (native)Full SQL support, JSONB columns, materialized views, extensions.
Microsoft SQL ServerWindows Authentication, SQL AuthenticationODBCT-SQL dialect, linked servers, temporal tables.
MySQLPasswordNative driverInnoDB and MyISAM support, full-text search, replication-aware connections.
MariaDBPasswordNative driverMySQL-compatible with additional storage engines and features.
Amazon RedshiftIAM roles, PasswordJDBCColumnar storage, Spectrum for S3 queries, concurrency scaling.
Google BigQueryService Account (JSON key)REST APIServerless, petabyte-scale analytics, Standard SQL dialect.
DatabricksPersonal Access TokenSpark SQLUnity Catalog integration, Delta Lake support, SQL warehouses.
Azure Synapse AnalyticsAzure Active Directory, SQL AuthenticationODBCDedicated and serverless SQL pools, PolyBase integration.
Oracle DatabaseTNS (Transparent Network Substrate)OCI (Oracle Call Interface)PL/SQL support, partitioning, RAC-aware connections.
IBM DB2PasswordODBCz/OS and LUW support, federated queries, workload management.
Driver Selection

Datafi selects the highest-performance driver available for each data source. Native drivers (such as sqlx for PostgreSQL) are preferred over ODBC when they provide better performance or feature coverage.

Cloud Storage

You can connect to flat files and semi-structured data stored in cloud object storage. Datafi reads these files and presents them as queryable datasets.

Storage ProviderSupported FormatsAuthentication
Amazon S3CSV, JSON, ParquetIAM Role, Access Key + Secret Key
Azure Blob StorageCSV, JSON, ParquetShared Access Signature (SAS), Azure AD
Google Cloud StorageCSV, JSON, ParquetService Account (JSON key)

How it works:

  1. You configure a cloud storage connector with the bucket or container path and authentication credentials.
  2. Datafi scans the specified path and discovers available files.
  3. Files are registered as datasets with inferred schemas.
  4. Queries against these datasets are executed by the Edge node, which reads the files on demand.
note

Cloud storage connectors are best suited for analytical workloads on static or slowly changing data. For high-frequency transactional access, use a dedicated database connector.

File-Based Sources

FormatDescriptionSchema Detection
CSVComma-separated values. Configurable delimiters, headers, and encoding.Automatic type inference from sample rows.
JSONJSON arrays or newline-delimited JSON (NDJSON).Automatic schema inference from document structure.

Third-Party Platform Connectors

Datafi provides connectors for third-party business platforms, allowing you to query operational data alongside your analytical databases.

PlatformAuthenticationData Access
SalesforceOAuth 2.0Objects, reports, and SOQL queries.
Microsoft Dynamics 365Azure AD / OAuth 2.0Entities and custom tables via Dataverse API.
NetSuiteToken-Based Authentication (TBA)SuiteQL queries, saved searches, records.
info

Third-party connectors expose platform data as standard datasets. You can apply the same policies, build the same data views, and use the same query interface as you would with any relational database.

Planned Connectors

The following connectors are on the roadmap. Availability dates are subject to change.

Data SourceStatusExpected Driver
ShopifyIn developmentREST / GraphQL API
HubSpotPlannedREST API
SAP HANAPlannedODBC
TeradataPlannedODBC
ElasticsearchPlannedREST API
MongoDBPlannedNative driver
CassandraPlannedNative driver
ClickHousePlannedNative driver

Connector Configuration Summary

Every connector requires the following baseline configuration:

ParameterDescriptionRequired
NameA human-readable identifier for the connector.Yes
TypeThe data source type (e.g., postgresql, snowflake, s3).Yes
Edge ServerThe Edge node that will host this connection.Yes
AuthenticationCredentials specific to the data source type.Yes
Host / EndpointThe database hostname, IP address, or API endpoint.Yes (except cloud storage)
PortThe database port. Defaults are applied per data source type.No
Database / SchemaThe default database and schema to connect to.Varies by type
Connection Pool SizeMaximum number of concurrent connections to the data source.No (default: 10)
TimeoutConnection and query timeout in seconds.No (default: 300s)

Authentication Methods Reference

MethodData SourcesDescription
PasswordPostgreSQL, MySQL, MariaDB, MSSQL, DB2, RedshiftStandard username and password authentication.
JWTSnowflakeJSON Web Token for service-to-service authentication.
OAuth 2.0Snowflake, Salesforce, DynamicsDelegated authorization using OAuth flows.
SSL Client CertificatesPostgreSQLMutual TLS using client certificate and key.
Windows AuthenticationMSSQL, SynapseIntegrated Windows / Kerberos authentication.
Azure Active DirectorySynapse, DynamicsAzure AD tokens for Microsoft services.
IAM RolesRedshift, S3AWS Identity and Access Management.
Service AccountBigQuery, GCSGoogle Cloud service account JSON key file.
Personal Access TokenDatabricksToken-based authentication for Databricks workspaces.
TNSOracleOracle Net Services connection descriptor.
Token-Based (TBA)NetSuiteNetSuite Token-Based Authentication.
SAS TokenAzure BlobShared Access Signature for Azure Storage.
Access KeyS3AWS access key ID and secret access key pair.

Next Steps

  • Multi-Protocol APIs -- Learn how to interact with your connected data sources through the available API protocols.
  • Request Lifecycle -- Understand how queries flow through the platform to your data sources.