Assertions enable you to check the state of data produced by other actions.
An assertion query is written to find rows that violate one or more rules. If the query returns any rows, then the assertion will fail.
Dataform provides a convenient way to define assertions as part of a dataset's config
settings.
Here's a complete example:
definitions/my_table.sqlx1config { 2 type: "table", 3 assertions: { 4 uniqueKey: ["user_id"], 5 nonNull: ["user_id", "customer_id"], 6 rowConditions: [ 7 'signup_date is null or signup_date > "2019-01-01"', 8 'email like "%@%.%"' 9 ] 10 } 11} 12select ...
For all configuration options, view the reference documentation for dataset assertions.
If the uniqueKey
property is set, the resulting assertion will fail if there is more than one row in the dataset with the same values for all of the column(s).
1config { 2 type: "table", 3 assertions: { 4 uniqueKey: ["user_id"] 5 } 6} 7select ...
Multiple key columns can be provided.
The generated assertion will be called <original_dataset_name>_assertions_uniqueKey
.
To assert that a set of columns are never null, provide an array of the column names with the nonNull
property:
1config { 2 type: "table", 3 assertions: { 4 nonNull: ["user_id", "customer_id", "email"] 5 } 6} 7select ...
The generated assertion will be called <original_dataset_name>_assertions_rowConditions
, because nonNull
is just shorthand for custom row conditions (see below).
For assertions that require custom logic to be evaluated against rows, use the rowConditions
property. Each row condition should be a SQL expression that is expected to evaluate to true
if the assertion should pass.
1config { 2 type: "table", 3 assertions: { 4 rowConditions: [ 5 'signup_date is null or signup_date > "2019-01-01"', 6 'email like "%@%.%"' 7 ] 8 } 9} 10select ...
Each row will be evaluated against each condition, and all rows must pass all conditions for the assertion to pass.
The generated assertion will be called: <original_dataset_name>_assertions_rowConditions
.
Assertions can also be defined manually for more advanced use cases, or for testing datasets that aren't created by Dataform.
To write a manual assertion, create a new SQLX file and set the type to assertion
in the config
block:
definitions/custom_assertion.sqlx1config { 2 type: "assertion" 3}
To write the assertion, write a new SQL query below the config block that should return zero rows. The best way to think about writing this queries is that they are queries that look for errors.
For example, to assert that fields a
, b
, and c
are never NULL
in a dataset named sometable
, create a file definitions/assert_sometable_not_null.sqlx
:
definitions/assert_sometable_not_null.sqlx1config { type: "assertion" } 2 3SELECT 4 * 5FROM 6 ${ref("sometable")} 7WHERE 8 a IS NULL 9 OR b IS NULL 10 OR c IS NULL
Another common requirement is to check that all values for a particular field or combination of fields are unique in a dataset.
For example, in a daily_customer_stats
dataset, there should only ever be a single row for each combination of the date
and customer_id
fields.
You can assert these requirements as follows:
1config { type: "assertion" } 2 3WITH base AS ( 4SELECT 5 date, 6 customer_id, 7 SUM(1) as rows 8FROM ${ref("daily_customer_stats")} 9) 10SELECT * FROM base WHERE rows > 1
This query will find any keys where there exists more than 1 row for that key (and thus are not unique).
Dataform automatically creates a view in your warehouse containing the results of the compiled assertion query. This makes it easy to inspect the rows that caused the assertion to fail without increasing storage requirements or pulling any data out of your warehouse.
Assertions create views in a seperate schema to your default schema. This is configured in your dataform.json
file.
Given a default assertion schema dataform_assertions
and an assertion file called definitions/some_assertion.sqlx
:
1config { type: "assertion" } 2SELECT * FROM ${ref("example")} WHERE test > 1
Dataform will create a view called dataform_assertions.some_assertion
in your warehouse using the following query:
1CREATE OR REPLACE VIEW dataform_assertions.some_assertion AS SELECT * from dataform.example WHERE test > 1
You can manually inspect this view to debug failing assertions.
Assertions create named actions in your project that can be depended upon by using the dependencies
configuration parameter.
If you would like another dataset, assertion, or operation to only run if a specific assertion passes, you can add the assertion to that action's dependencies.
For example, given two datasets called table1
and table2
, and an assertion called table1_not_null
, if you want to ensure that table2
only
runs if table1_not_null
passes, you could add it as a dependency in definitions/table2.sqlx
:
definitions/table2.sqlx1config { 2 type: "view", 3 dependencies: [ "table1_not_null" ] 4} 5 6SELECT * FROM ${ref("table1")} LEFT JOIN ...