Managing dependencies

Learn how to use the ref function in Dataform and how to view your project in the Dependency tree.

Managing Dependencies

Dataform provides methods that enable you to easily reference another dataset in your project using the ref function.

This provides two advantages:

  • You don’t have to provide the full SQL dataset name.
  • Any dataset that is referenced by a query will be automatically added to that query's dependencies. Dependency queries are always executed before dependent queries to ensure correctness.

In this step you'll learn how to manage dependencies in Dataform.

  1. You'll now create a second table called customers , following the same process as before

    • Click New Dataset and select the table template.
    • Give your table the name customers
    • Click Create .
  2. Define your dataset:

  • To create your table, use the following query:
1SELECT
2  customers.id AS id,
3  customers.first_name AS first_name,
4  customers.last_name AS last_name,
5  customers.email AS email,
6  customers.country AS country,
7  COUNT(orders.id) AS order_count,
8  SUM(orders.amount) AS total_spent
9
10FROM
11  dataform-demos.dataform_tutorial.crm_customers AS customers
12  LEFT JOIN ${ref('order_stats')} orders
13    ON customers.id = orders.customer_id
14
15WHERE
16  customers.id IS NOT NULL
17  AND customers.first_name <> 'Internal account'
18  AND country IN ('UK', 'US', 'FR', 'ES', 'NG', 'JP')
19
20GROUP BY 1, 2, 3, 4, 5
  • Paste the query into customers.sqlx , below the config block.
  • This query uses the ref function. The ref function enables you to reference any other table defined in a Dataform project.
  • You can see the dependencies of a dataset in the right hand side bar.
  • If you open the compiled query, you can see that the ref function has been replaced with the fully qualified table name.

The ref() function

To use the ref() function, pass it the database, schema, and name of the dataset you're referencing. Database and schema are optional: you'll only need to specify these if there are multiple datasets in your project with the same name. Read more about the ref() function [here](/reference#ICommonContext)
  1. Once you can see that your query is valid you can publish the table to your warehouse by clicking on Publish Table .

  2. View the Dependency tree:

    • Navigate to the menu in the top left hand corner of your project and click on the Dependency Tree tab.
The dependency tree

The dependency tree

Here you can see a visualisation of your entire Dataform project. Every time Dataform creates a run and executes SQL in your warehouse, it will update the actions in the corect dependency order.

You now have two tables created in your warehouse, one called order_stats and one called customers . customers depends on order_stats and will start running when order_stats is completed.

For more detailed info on managing dependencies in Dataform, see our docs.

What's next

Getting set up

Learn how to create a new BigQuery project and generate warehouse credentials.

Building your data model

Learn how to connect to a warehouse and create and publish your first dataset.

Setting up a schedule

Learn how to set up a schedule and alerts in Dataform

Data quality tests and documenting datasets

Learn how to set up data quality tests using assertions and how to document your datasets

Committing your changes

Learn how to commit changes you've made in your Dataform project

Sitemap