Managing dependencies

Start building out your dataform pipelines using SQLX files and managing dependencies.

This tutorial demonstrates how dependency trees are made within a project.

Create a new file

Create a new file by clicking on the + button next to the definitions folder and create a new SQLX file named onboarding_table . For now, ignore the other options; these are more advanced features which will be covered in later documentation.

New file

Write a query

In the text editor, write a simple SQLX statement:

1config { 
2  type: "view" 
6  1 as one,
7  2 as two,
8  3 as three

You will see the right sidebar update with onboarding_table which is the name of the dataset, a tag for the type - a view - as well as a validation message and two action buttons.

The Preview results button executes the query in your warehouse and returns the output at the bottom of the page. This can be useful during query development to check that the query returns expected output.

Run the query

Clicking the Run this action button will open a dialog box with a few options for running your actions. Click Run now to continue.

Upon successful completion of the run, a new view named dataform.onboarding_table will be created in your warehouse.

By default, Dataform creates datasets and views under a schema named dataform . You can change this in the dataform.json in your project configuration.

Run node

Create a table instead of a view

To create a table instead of a view, simply change your script's config block:

1config { 
2  type: "table" 

You will see the tag on the right sidebar update from view to table .

Dataform provides many other options for dataset creation. Learn about publishing datasets and incremental datasets.

Referencing other datasets

Dataform provides methods that enable you to easily reference another dataset in your project using the ref() function:

2  * 
4  ${ref('my_table')}

This provides two advantages:

  • You don’t have provide the full SQL dataset name.
  • Any dataset that is referenced by a query will be automatically added to that query's dependencies. Dependency queries are always executed before dependent queries to ensure correctness.

Create a new file

Create a new SQLX file named onboarding_table2 and add the following content:

1config { 
2  type: "view" 
6  * 
8  ${ref('onboarding_table')}

You will see the right sidebar update with the name of the view, onboarding_table2 . Clicking on Compiled query will display the fully qualified query to be run in your data warehouse. The ${ref()} function will be replaced with the actual name of the dataset in your data warehouse. Note that the dependency is also listed just above the compiled query.

Your compiled script will differ depending on your cloud data warehouse. This example uses Redshift.

Referencing a dataset that hasn't yet been created in your data warehouse will cause validation errors. These validation errors are warning you that the dataset you are referencing doesn't exist (yet). However, since Dataform will run your project's scripts in dependency order, you can safely ignore these errors.

View dependency tree

The overview page (linked at the top left of Dataform) gives you an overview of your project. The overview includes a visualization of the dependency tree of your project, containing the two datasets created by your scripts.

In bigger projects containing dozens or hundreds of datasets with complicated interdependencies, the overview page can help you keep track of your project's overall structure.

What's next

Creating a new project

Learn how to create a new dataform web project and connect to your warehouse.