Data quality tests and documenting datasets

Learn how to set up data quality tests using assertions and how to document your datasets

Data quality tests

Adding tests to a project helps validate that your models are working correctly. These tests are run every time your project updates, sending alerts if the tests fail. Data quality tests in Dataform are called assertions.

Assertions

Assertions enable you to check the state of data produced by other actions. An assertion query is written to find rows that violate one or more rules. If the query returns any rows, then the assertion will fail. There are various different types of assertions you can use including uniqueness checks and null checks. The simplest way to define assertions is as part of a dataset's `config` settings.

We want to create an assertion for our order_stats table that will fail if there is more than one row in the dataset with the same value for id .

  1. Navigate back to your order_stats table:

    • Click on the hamburger menu in the top left hand corner of your project and click on Develop Project .
    • Navigate to the order_stats file.
  2. In the config block add your assertion:

    • Copy and paste this example into your file, replacing the config block that is already there.
1
2
3
4
5
6
config {
  type: "table",
  assertions: {
    uniqueKey: ["id"]
  }
}

Dataform automatically creates a view in your warehouse containing the results of the compiled assertion query. This makes it easy to inspect the rows that caused the assertion to fail.

You can also choose to use assertions as dependencies. Assertions create named actions in your project that can be depended upon by using the dependencies config parameter. If you would like another dataset, assertion, or operation to only run if a specific assertion passes, you can add the assertion to that action's dependencies.

Documenting your dataset

Dataform allows you to add documentation to the datasets defined in your project. Table and field descriptions are added using the same config block where you wrote your assertion.

  1. In the config block write your documentation:

    • Copy and paste the below example into your file, replacing the config block that is already there.
    • This config block is now adding table and field descriptions to your table.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
config {
  type: "table",
    description: "This table joins orders information from Shopify & payment information from Stripe",
  columns: {
    order_date: "The date when a customer placed their order",
    id: "Order ID as defined by Shopify",
    order_status: "The status of an order e.g. sent, delivered",
    customer_id: "Unique customer ID",
    payment_status: "The status of a payment e.g. pending, paid",
    payment_method: "How the customer chose to pay",
    item_count: "The number of items the customer ordered",
    amount: "The amount the customer paid"
  },
    assertions: {
    uniqueKey: ["id"]
  }
}
  • Table and field descriptions are now added to your table.
  1. View the Dependency tree :

    • Navigate back to the Dependency tree tab in the hamburger menu in the top left hand corner:
    • Click on the order_stats table on the left.
    • You can view your table and field descriptions in the data catalog.
The Data Catalog

You have now created an assertion which will fail if any value in your id field is duplicated and you have given your order_stats table and its fields descriptions.

You can find more information on assertions and documentation in our docs.

What's next

Getting set up

Learn how to create a new BigQuery project and generate warehouse credentials.

Building your data model

Learn how to connect to a warehouse and create and publish your first dataset.

Managing dependencies

Learn how to use the ref function in Dataform and how to view your project in the Dependency tree.

Setting up a schedule

Learn how to set up a schedule and alerts in Dataform

Committing your changes

Learn how to committ changes you've made in your Dataform project

Sitemap