Skip to main content

Multi-Protocol APIs

The Datafi Coordinator exposes four distinct API protocols, each optimized for different client environments and use cases. All four protocols provide access to the same underlying platform capabilities, authenticated and authorized through the same JWT and ABAC mechanisms.

Protocol Overview

Port Configuration

ProtocolPortTransportSerializationPrimary Use Case
gRPC50051HTTP/2Protocol BuffersServer-to-server communication, high-throughput pipelines.
gRPC-Web8001HTTP/1.1 or HTTP/2Protocol BuffersBrowser-based applications using the Client Library.
HTTP8000HTTP/1.1 or HTTP/2JSONREST-style integrations, scripts, CLI tools, webhooks.
MCP8002HTTP/1.1 or HTTP/2JSONAI agent communication via the Model Context Protocol.
Edge Node Ports

Edge nodes expose a separate set of ports. gRPC on port 50051 for Coordinator-to-Edge communication, and HTTP on port 80 for health checks. You do not interact with Edge ports directly -- the Coordinator handles all routing.

gRPC (Port 50051)

gRPC is the highest-performance protocol available on the Coordinator. It uses HTTP/2 for multiplexed streaming and Protocol Buffers for compact binary serialization.

When to use gRPC:

  • You are building a server-side application in a language with strong gRPC support (Go, Java, Python, Rust, C#, Node.js).
  • You need bidirectional streaming for real-time data feeds.
  • You want the lowest possible latency and the smallest message overhead.
  • You are building internal microservices that communicate with the Coordinator.

Capabilities:

  • Access to all 81 Coordinator RPC methods.
  • Unary, server-streaming, and bidirectional-streaming call patterns.
  • Deadline propagation and cancellation.
  • Built-in retry and backoff policies via gRPC interceptors.

Authentication:

Include the JWT as metadata on every call:

metadata: {"authorization": "Bearer <your-jwt>"}

Example (Python):

import grpc
from datafi.v1 import coordinator_pb2, coordinator_pb2_grpc

channel = grpc.secure_channel("coordinator.example.com:50051", grpc.ssl_channel_credentials())
stub = coordinator_pb2_grpc.CoordinatorServiceStub(channel)

metadata = [("authorization", "Bearer eyJhbGciOiJSUzI1NiIs...")]
response = stub.ExecuteQuery(
coordinator_pb2.ExecuteQueryRequest(query="..."),
metadata=metadata,
)

gRPC-Web (Port 8001)

gRPC-Web adapts the gRPC protocol for browser environments. It provides the same Protocol Buffers serialization and type safety as native gRPC, but works over HTTP/1.1 and does not require HTTP/2 end-to-end.

When to use gRPC-Web:

  • You are building a browser-based application using the Datafi Client Library.
  • You want the performance benefits of Protocol Buffers in a web context.
  • You need server-side streaming to the browser (e.g., progressive result loading).

Capabilities:

  • Access to all 81 Coordinator RPC methods.
  • Unary and server-streaming call patterns (bidirectional streaming is not supported in browsers).
  • Automatic integration with the WebAssembly-based Client Library.

How the Client Library uses gRPC-Web:

The Datafi Client Library handles gRPC-Web communication transparently. You interact with a GraphQL API in your application code, and the library translates your queries into gRPC-Web calls under the hood.

import { DatafiClient } from "@datafi/client";

const client = new DatafiClient({
coordinatorUrl: "https://coordinator.example.com:8001",
token: "eyJhbGciOiJSUzI1NiIs...",
});

const result = await client.query(`
query {
employees(filter: { department: "engineering" }, limit: 50) {
employee_id
name
title
}
}
`);

HTTP (Port 8000)

The HTTP API provides a conventional REST-style interface using JSON serialization. It is the most accessible protocol for integrations, scripts, and tools that do not support gRPC.

When to use HTTP:

  • You are integrating with third-party tools, webhooks, or no-code platforms.
  • You are writing quick scripts in languages without gRPC libraries.
  • You prefer working with JSON and standard HTTP methods.
  • You are using curl, Postman, or similar HTTP tools for testing and exploration.

Capabilities:

  • Access to all 81 Coordinator RPC methods via RESTful endpoints.
  • Standard HTTP methods (GET, POST, PUT, DELETE).
  • JSON request and response bodies.
  • Standard HTTP status codes for error handling.

Authentication:

Include the JWT in the Authorization header:

Authorization: Bearer <your-jwt>

Example (curl):

curl -X POST https://coordinator.example.com:8000/v1/query \
-H "Authorization: Bearer eyJhbGciOiJSUzI1NiIs..." \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT employee_id, name, title FROM employees WHERE department = '\''engineering'\'' LIMIT 50"
}'

MCP (Port 8002)

The Model Context Protocol (MCP) is designed for AI agent communication. It enables large language models and autonomous agents to interact with the Datafi platform programmatically.

When to use MCP:

  • You are building AI agents that need to query, explore, or analyze data.
  • You want to expose your data catalog to LLM-based tools.
  • You are integrating with agent frameworks that support MCP (e.g., Claude, LangChain, custom orchestrators).

Capabilities:

  • Schema discovery -- Agents can explore available datasets, columns, and data types.
  • Natural language to query translation -- Combined with the Coordinator's AI/ML orchestration layer.
  • Policy-aware responses -- All ABAC policies are enforced, ensuring agents only access authorized data.
  • Tool-use interface -- Structured tool definitions that agents can invoke.

Authentication:

MCP uses the same JWT authentication as all other protocols:

Authorization: Bearer <your-jwt>
tip

When you configure an AI agent in Datafi, the platform automatically provisions the appropriate MCP endpoint and credentials. You do not need to manage MCP connections manually in most cases.

Performance Specifications

The following specifications apply across all protocols unless otherwise noted.

ParameterValueNotes
Maximum message size1 GBApplies to both request and response payloads.
Default request timeout5 minutesConfigurable per request via deadline or timeout headers.
Result serializationApache ArrowResults are serialized in Apache Arrow columnar format for high-performance processing.
Concurrent connectionsUnlimitedBounded by available Coordinator resources and load balancer configuration.
TLSRequiredAll protocols require TLS in production deployments.
Large Result Sets

While the maximum message size is 1 GB, you should use pagination or streaming for large result sets. Streaming is available on gRPC and gRPC-Web protocols. For the HTTP protocol, use cursor-based pagination.

Protocol Comparison

FeaturegRPCgRPC-WebHTTPMCP
SerializationProtocol BuffersProtocol BuffersJSONJSON
StreamingBidirectionalServer-side onlyNoNo
Browser supportNoYesYesVia agent frameworks
Type safetyStrong (protobuf)Strong (protobuf)Weak (JSON)Weak (JSON)
LatencyLowestLowModerateModerate
Payload sizeSmallestSmallLargestLargest
Best forServicesWeb appsScripts / integrationsAI agents

When to Use Each Protocol

  • Building a web application? Use gRPC-Web through the Client Library. You get near-native performance, type safety, and automatic authentication handling.
  • Building a backend service? Use gRPC for the best performance and strongest typing. Use HTTP if your language or framework lacks gRPC support.
  • Writing a script or one-off integration? Use HTTP. It requires no code generation, and you can test endpoints with curl.
  • Building an AI agent? Use MCP. It provides structured tool definitions and integrates with the platform's AI orchestration layer.

RPC Method Distribution

ServiceProtocolMethod Count
CoordinatorgRPC / gRPC-Web / HTTP / MCP81 methods
EdgegRPC (TLS)3 methods (Query, GetSchema, Ping)

The 81 Coordinator methods span all platform operations including authentication, workspace management, connector configuration, dataset operations, policy management, data app administration, agent orchestration, and workflow execution.

Next Steps

  • Architecture -- Review how the protocols fit into the three-service architecture.
  • Request Lifecycle -- Understand the full query flow from authentication through result aggregation.