Skip to main content

PRQL Guide

PRQL (Pipelined Relational Query Language) is a modern query language that compiles to database-specific SQL. It serves as an intermediate representation between natural language and SQL -- readable like prose, yet precise enough for the query engine. Datafi compiles your PRQL into the appropriate SQL dialect for whatever connector you are targeting.

Learn More

PRQL is an open-source language with its own specification. For the full language reference, visit the PRQL Book.


How PRQL Works

A PRQL query is a pipeline. Each line takes the output of the previous line, transforms it, and passes it to the next. You read the query top to bottom, and each step is explicit.

from employees
filter department == "Engineering"
select {employee_id, name, salary}
sort {-salary}
take 10

Datafi compiles this into the SQL dialect required by your datasource (e.g., PostgreSQL, Snowflake, BigQuery) before executing it.


Core Transforms

from -- Specify the Source Table

Every PRQL query begins with from, which identifies the table you are querying.

PRQLSQL
from customersSELECT * FROM customers

filter -- Filter Rows

filter restricts the result set to rows matching a condition. Chain multiple conditions with && (AND) and || (OR).

from orders
filter (total_amount > 500 && status == "completed")

select -- Choose Columns

select specifies which columns to include in the output. Wrap multiple columns in curly braces.

from customers
select {customer_id, first_name, last_name, email}

sort -- Order Results

sort orders the result set. Prefix a column name with - for descending order.

from products
sort {category, -price}

take -- Limit Results

take restricts the number of rows returned, equivalent to SQL LIMIT.

from orders
take 25

aggregate -- Compute Aggregations

aggregate computes summary statistics across the result set.

from orders
aggregate {
total_orders = count order_id,
avg_amount = average total_amount,
max_amount = max total_amount
}

group -- Group and Aggregate

group partitions data by one or more columns, then applies an aggregate within each group.

from orders
group region (
aggregate {
order_count = count order_id,
total_revenue = sum total_amount
}
)

join -- Combine Tables

join merges rows from two tables based on a condition. Specify join type with side:left or side:right.

from orders
join customers (orders.customer_id == customers.customer_id)
select {orders.order_id, customers.first_name, orders.total_amount}

Side-by-Side Summary

PRQLSQL Equivalent
from tableSELECT * FROM table
filter conditionWHERE condition
select {cols}SELECT cols
sort {col}ORDER BY col ASC
sort {-col}ORDER BY col DESC
take nLIMIT n
aggregate {fn col}SELECT fn(col)
group col (aggregate {...})GROUP BY col with aggregates
join table (condition)JOIN table ON condition

Complete Example

PRQL:

from orders
join customers (orders.customer_id == customers.customer_id)
filter orders.order_date >= @2025-01-01
group customers.region (
aggregate {
order_count = count orders.order_id,
total_revenue = sum orders.total_amount
}
)
sort {-total_revenue}
take 10

Equivalent SQL:

SELECT
customers.region,
COUNT(orders.order_id) AS order_count,
SUM(orders.total_amount) AS total_revenue
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
WHERE orders.order_date >= '2025-01-01'
GROUP BY customers.region
ORDER BY total_revenue DESC
LIMIT 10

Next Steps