Managing dependencies

Learn how to use the ref function in Dataform and how to view your project in the Dependency tree.

Managing Dependencies

Dataform provides methods that enable you to easily reference another dataset in your project using the ref function.

This provides two advantages:

  • You don’t have to provide the full SQL dataset name.
  • Any dataset that is referenced by a query will be automatically added to that query's dependencies. Dependency queries are always executed before dependent queries to ensure correctness.

In this step you'll learn how to manage dependencies in Dataform.

  1. You'll now create a second table called customers , following the same process as before

    • Click New Dataset and select the table template.
    • Give your table the name customers
    • Click Create .
  2. Define your dataset:

  • To create your table, use the following query:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT
  customers.id AS id,
  customers.first_name AS first_name,
  customers.last_name AS last_name,
  customers.email AS email,
  customers.country AS country,
  COUNT(orders.id) AS order_count,
  SUM(orders.amount) AS total_spent

FROM
  dataform-demos.dataform_tutorial.crm_customers AS customers
  LEFT JOIN ${ref('order_stats')} orders
    ON customers.id = orders.customer_id

WHERE
  customers.id IS NOT NULL
  AND customers.first_name <> 'Internal account'
  AND country IN ('UK', 'US', 'FR', 'ES', 'NG', 'JP')

GROUP BY 1, 2, 3, 4, 5
  • Paste the query into customers.sqlx , below the config block.
  • This query uses the ref function. The ref function enables you to reference any other table defined in a Dataform project.
  • You can see the dependencies of a dataset in the right hand side bar.
  • If you open the compiled query, you can see that the ref function has been replaced with the fully qualified table name.
  1. Once you can see that your query is valid you can publish the table to your warehouse by clicking on Publish Table .

  2. View the Dependency tree:

    • Navigate to the menu in the top left hand corner of your project and click on the Dependency Tree tab.
The dependency tree

The dependency tree

Here you can see a visualisation of your entire Dataform project. Every time Dataform creates a run and executes SQL in your warehouse, it will update the actions in the corect dependency order.

You now have two tables created in your warehouse, one called order_stats and one called customers . customers depends on order_stats and will start running when order_stats is completed.

For more detailed info on managing dependencies in Dataform, see our docs.

What's next

Getting set up

Learn how to create a new BigQuery project and generate warehouse credentials.

Building your data model

Learn how to connect to a warehouse and create and publish your first dataset.

Setting up a schedule

Learn how to set up a schedule and alerts in Dataform

Data quality tests and documenting datasets

Learn how to set up data quality tests using assertions and how to document your datasets

Committing your changes

Learn how to committ changes you've made in your Dataform project

Sitemap