Unit tests give you confidence that your code produces the output data you expect. Unit tests differ from assertions in that assertions are used to check the contents of datasets in your data warehouse, while unit tests are used to validate your SQL code. Assertions verify data, and unit tests verify logic.
A SQLX unit test passes fake input to a table
or view
query, checking that the output rows match some expected output data.
Suppose we have the following view
SQLX query:
1config { 2 type: "view", 3 name: "age_groups" 4} 5SELECT 6 FLOOR(age / 5) * 5 AS age_group, 7 COUNT(1) AS user_count 8FROM ${ref("ages")} 9GROUP BY age_group
We might want to write a unit test to check that the age_groups
query works as we expect it to. Create a file definitions/test_age_groups.sqlx
:
1config { 2 type: "test", 3 dataset: "age_groups" 4} 5 6input "ages" { 7 SELECT 15 AS age UNION ALL 8 SELECT 21 AS age UNION ALL 9 SELECT 24 AS age UNION ALL 10 SELECT 34 AS age 11} 12 13SELECT 15 AS age_group, 1 AS user_count UNION ALL 14SELECT 20 AS age_group, 2 AS user_count UNION ALL 15SELECT 30 AS age_group, 1 AS user_count
This unit test replaces the ages
input to the age_groups
query, and checks that the resulting output rows match the three expected age_group
rows.
input "schemaName", "inputName" { ... }
.
A unit test fails if the actual output from the dataset is not equal to the expected output. This means that:
Note that unit tests do not fail if columns are not output in the same order.
As with unit testing in other languages and frameworks, it's considered bad practice for a unit test to be non-hermetic. This means that running your test
should have no dependencies on any state in your data warehouse. Thus, for queries you'd like to test, all input datasets should be referenced using ref()
or resolve()
. The test then injects the fake inputs
provided for each input dataset.
If you use the Dataform CLI, you can run all tests in your project directory with the dataform test
command.