PRQL Guide
PRQL (Pipelined Relational Query Language) is a modern query language that compiles to database-specific SQL. It serves as an intermediate representation between natural language and SQL -- readable like prose, yet precise enough for the query engine. Datafi compiles your PRQL into the appropriate SQL dialect for whatever connector you are targeting.
PRQL is an open-source language with its own specification. For the full language reference, visit the PRQL Book.
How PRQL Works
A PRQL query is a pipeline. Each line takes the output of the previous line, transforms it, and passes it to the next. You read the query top to bottom, and each step is explicit.
from employees
filter department == "Engineering"
select {employee_id, name, salary}
sort {-salary}
take 10
Datafi compiles this into the SQL dialect required by your datasource (e.g., PostgreSQL, Snowflake, BigQuery) before executing it.
Core Transforms
from -- Specify the Source Table
Every PRQL query begins with from, which identifies the table you are querying.
| PRQL | SQL |
|---|---|
from customers | SELECT * FROM customers |
filter -- Filter Rows
filter restricts the result set to rows matching a condition. Chain multiple conditions with && (AND) and || (OR).
from orders
filter (total_amount > 500 && status == "completed")
select -- Choose Columns
select specifies which columns to include in the output. Wrap multiple columns in curly braces.
from customers
select {customer_id, first_name, last_name, email}
sort -- Order Results
sort orders the result set. Prefix a column name with - for descending order.
from products
sort {category, -price}
take -- Limit Results
take restricts the number of rows returned, equivalent to SQL LIMIT.
from orders
take 25
aggregate -- Compute Aggregations
aggregate computes summary statistics across the result set.
from orders
aggregate {
total_orders = count order_id,
avg_amount = average total_amount,
max_amount = max total_amount
}
group -- Group and Aggregate
group partitions data by one or more columns, then applies an aggregate within each group.
from orders
group region (
aggregate {
order_count = count order_id,
total_revenue = sum total_amount
}
)
join -- Combine Tables
join merges rows from two tables based on a condition. Specify join type with side:left or side:right.
from orders
join customers (orders.customer_id == customers.customer_id)
select {orders.order_id, customers.first_name, orders.total_amount}
Side-by-Side Summary
| PRQL | SQL Equivalent |
|---|---|
from table | SELECT * FROM table |
filter condition | WHERE condition |
select {cols} | SELECT cols |
sort {col} | ORDER BY col ASC |
sort {-col} | ORDER BY col DESC |
take n | LIMIT n |
aggregate {fn col} | SELECT fn(col) |
group col (aggregate {...}) | GROUP BY col with aggregates |
join table (condition) | JOIN table ON condition |
Complete Example
PRQL:
from orders
join customers (orders.customer_id == customers.customer_id)
filter orders.order_date >= @2025-01-01
group customers.region (
aggregate {
order_count = count orders.order_id,
total_revenue = sum orders.total_amount
}
)
sort {-total_revenue}
take 10
Equivalent SQL:
SELECT
customers.region,
COUNT(orders.order_id) AS order_count,
SUM(orders.total_amount) AS total_revenue
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
WHERE orders.order_date >= '2025-01-01'
GROUP BY customers.region
ORDER BY total_revenue DESC
LIMIT 10
Next Steps
- DQL Syntax Reference -- If you prefer SQL-style syntax, explore the DQL reference.
- Query Examples -- Browse practical, runnable examples for common use cases.