Key Concepts
This page defines the core concepts you will encounter throughout the Datafi platform. Understanding these terms will help you navigate the documentation, configure your environment, and communicate effectively with your team.
Organizational Concepts
Tenants
A Tenant is the top-level organizational boundary in Datafi. Each tenant represents a single customer or organization. All resources -- workspaces, users, policies, connectors -- belong to exactly one tenant. Tenant isolation is enforced at every layer of the platform, from authentication through query execution.
- Each tenant has its own identity provider configuration and user directory.
- Data from one tenant is never accessible to another tenant.
- Billing, usage metrics, and audit logs are scoped to the tenant level.
Workspaces
A Workspace is a logical environment within a tenant. You use workspaces to separate concerns such as development, staging, and production, or to isolate projects and teams. Each workspace has its own set of connectors, datasets, policies, and data apps.
- A tenant can contain multiple workspaces.
- Users can be granted access to specific workspaces through role assignments.
- Workspaces allow you to maintain independent configurations for different use cases without creating separate tenants.
Infrastructure Concepts
Coordinators
A Coordinator is the central orchestration service in the Datafi architecture. It handles authentication, authorization, query planning, query compilation, routing to Edge nodes, and result aggregation. The Coordinator never connects directly to your databases.
- Exposes four protocol endpoints: gRPC (
:50051), gRPC-Web (:8001), HTTP (:8000), and MCP (:8002). - Implements 81 RPC methods covering all platform operations.
- Stateless by design; you can run multiple Coordinator instances behind a load balancer.
For details, see Architecture.
Edge Servers
An Edge Server (or Edge node) provides secure, minimal database connectivity. It is deployed close to your data sources and exposes only three RPC methods: Query, GetSchema, and Ping.
- Each Edge Server connects to one or more databases using native drivers or ODBC.
- Edge Servers run inside your network perimeter, ensuring data never leaves your infrastructure.
- Communication between the Coordinator and Edge Servers is encrypted with TLS.
For details, see Architecture.
Data Concepts
Connectors
A Connector defines a connection to a specific data source. It stores the connection parameters, authentication credentials, and driver configuration needed to reach a database, API, or file store.
- Each connector is associated with one Edge Server.
- Supported connector types include relational databases (PostgreSQL, Snowflake, MSSQL, MySQL, and others), cloud storage (S3, Azure Blob, GCS), and third-party services (Salesforce, Dynamics, NetSuite).
- Connector credentials are encrypted at rest and never exposed through the API.
For the full list, see Supported Data Sources.
Datasets
A Dataset is a logical representation of a table, view, or query result from a connected data source. When you register a connector, Datafi discovers the available tables and surfaces them as datasets.
- Datasets are the primary unit of data access within the platform.
- You can apply policies, create data views, and build data apps on top of datasets.
- Datasets maintain metadata such as column names, data types, row counts, and freshness indicators.
Schemas
A Schema describes the structure of a dataset -- its columns, data types, constraints, and relationships. Datafi retrieves schema information from your data sources through the GetSchema RPC method on Edge nodes.
- Schemas are automatically synced when you create or refresh a connector.
- Schema changes in the underlying database are detected during sync operations.
- You can annotate schema elements with descriptions, tags, and sensitivity classifications.
Data Views
A Data View is a virtual, policy-aware projection of one or more datasets. Data views let you define exactly which columns and rows a user or group can access without modifying the underlying data.
- Data views are defined declaratively and evaluated at query time.
- You can join datasets from different data sources within a single data view.
- Row-level and column-level filtering are applied based on the policies attached to the view.
Catalogs
A Catalog is a searchable inventory of all datasets, data views, and metadata within a workspace. The catalog helps you discover available data, understand its structure, and assess its quality.
- The catalog is automatically populated when you register connectors and sync schemas.
- You can enrich catalog entries with descriptions, tags, owners, and lineage information.
- Search supports filtering by data source, schema, tag, sensitivity level, and other attributes.
Application Concepts
Data Apps
A Data App is a lightweight, embeddable application built on top of Datafi data views. Data apps enable you to create interactive dashboards, forms, and reports that consume federated data in real time.
- Data apps use the Client Library (WebAssembly) to query data directly from the browser.
- Access control is inherited from the underlying data views and policies.
- You can embed data apps in existing web applications or share them as standalone URLs.
Agents
An Agent is an AI-powered component that can interact with your data through natural language. Agents use the platform's AI/ML orchestration layer to translate user intent into structured queries, apply policies, and return results.
- Agents respect the same ABAC policies as all other access methods.
- You can configure agents with custom instructions, tool access, and data source scopes.
- Agents communicate through the MCP protocol on port
8002.
Workflows
A Workflow is an automated sequence of actions triggered by events or schedules. Workflows let you build data pipelines, alerting rules, and integration routines without writing custom infrastructure code.
- Workflows can execute queries, transform data, call external APIs, and send notifications.
- Each step in a workflow runs within the security context of the configured service account.
- Workflow execution logs are captured for auditing and debugging.
Governance Concepts
Policies
A Policy is a declarative rule set that governs access to data within the platform. Datafi uses Attribute-Based Access Control (ABAC), which means policies are evaluated based on attributes of the user, the resource, the action, and the environment.
- Policies are attached to datasets, data views, or workspaces.
- A policy can enforce row-level filtering, column masking, rate limiting, or outright denial.
- Policies are evaluated at the Coordinator during every request -- before any query reaches an Edge node.
Rules
A Rule is an individual condition within a policy. Rules define the specific logic that determines whether an action is allowed, denied, or modified.
- Rules use attribute expressions such as
user.department == "engineering"orresource.sensitivity == "PII". - Multiple rules within a policy are combined using configurable logic (all must pass, any must pass, etc.).
- Rules can reference dynamic attributes such as time of day, IP address, or device type.
Concept Relationships
Next Steps
- Request Lifecycle -- See how these concepts interact during query execution.
- Architecture -- Understand the services that power the platform.