Overview
Datafi allows customers to host certain components of its infrastructure to comply with security requirements and improve performance. The primary components that can be self-hosted are:
Edge Server: A containerized workload that initiates requests to data sources and enforces governance policies on the response data.
Vector Database: Stores vector information of unstructured data ingested into Datafi.
Storage Account: Stores all files uploaded to the Datafi workspace.
Prerequisites
Before you begin, ensure you have the following:
Deploying Edge Server
The Edge Server can be deployed as a Docker container on Azure. Follow these steps:
Step 1: Create an Azure Container Instance
Open the Azure Portal and navigate to Container instances.
Click on + Add to create a new container instance.
Fill in the required details:
Resource Group: Select an existing resource group or create a new one.
Container Name: Provide a name for your container.
Region: Select the region where you want to deploy the container.
Image Source: Select Docker Hub.
Image Type: Public.
Image: Enter datafi/es.
Size: Select the appropriate size for your container.
Step 2: Retrieve the Endpoint
Once the container instance is created, retrieve the endpoint (hostname or IP address) from the Azure Portal.
Step 3: Generate the Key Locally
Run the following command locally to generate the KEY:
docker run --rm -e ENDPOINT=<hostname/ip-address> datafi/es
Replace <hostname/ip-address> with the endpoint retrieved in Step 2. This command will generate a KEY and print it out.
Step 4: Update the Container Instance with the Key
Return to the Azure Portal and navigate to the container instance you created.
Update the container instance to add the following environment variable:
KEY: The key is generated in Step 3.
Step 5: Verify the Deployment
Make a curl request to the endpoint to verify the deployment:
curl -i https://<your-endpoint>
Deploying Vector Database
Datafi supports multiple vector databases, with Redis being the preferred option. Follow these steps to deploy Redis on Azure:
Step 1: Create an Azure Cache for Redis
Open the Azure Portal and navigate to Azure Cache for Redis.
Click on + Add to create a new Redis cache.
Fill in the required details:
Resource Group: Select an existing resource group or create a new one.
DNS Name: Provide a unique name for your Redis cache.
Location: Select the region where you want to deploy the Redis cache.
Pricing Tier: Select the appropriate pricing tier.
Step 2: Configure Redis
Once the Redis cache is created, configure it according to your requirements.
Step 3: Connect Datafi to Redis
In the Datafi AI/ML admin page, specify the Redis connection details to connect to the Redis cache.
Deploying Storage Account
Datafi uses Azure Blob Storage to store files. Follow these steps to create and configure a storage account:
Step 1: Create a Storage Account
Open the Azure Portal and navigate to Storage accounts.
Click on + Add to create a new storage account.
Fill in the required details:
Resource Group: Select an existing resource group or create a new one.
Storage Account Name: Provide a unique name for your storage account.
Region: Select the region where you want to deploy the storage account.
Performance: Select the appropriate performance tier.
Replication: Select the appropriate replication option.
Step 2: Configure Blob Storage
Once the storage account is created, configure the Blob storage according to your requirements.
Step 3: Connect Datafi to Blob Storage
In the Datafi AI/ML admin page, specify the Blob storage connection details to connect Datafi to the storage account.