Publishing tables

Publish a dataset

1. Create a new file

Create a new file by clicking on the + button next to the definitions folder and create a new SQL file named onboarding_dataset. Ignore for now the other options in this modal (Assertions, Operations, Javascript), they are more advanced features that will be covered later.

2. Write query

In the text editor, write a simple SQL statement like the following:

SELECT 1 as one,
       2 as two,
       3 as three

You will see the right sidebar updating with the name of the table/view, onboarding_dataset, a tag for the type - a view by default - as well as a validation message and two action buttons.

The execute query button executes the query in your warehouse and returns the output at the bottom of the page. That is useful while you are developing to check if the query is returning the output you are expecting.

3. Run node

Now click Run this node button. That will open a modal with a few options for running your nodes. Click Run now in the modal to continue.

This will create a new view in your data warehouse, that will be called dataform.onboarding_dataset.

By default, dataform creates views and table under a schema named dataform, you can change this in your dataform.json in your project configuration.

4. Create a table instead of a view

By default, dataform creates views in your data warehouse. To create a table instead of a view, simply add the following statement at the top of your script:

--js type("table");

You will see the tag on the right updating from view to table instantaneously.

Many more options are available to publishing datasets, you can read more on publishing datasets and incremental tables.

Referencing other models

Dataform provides methods that enable you to easily reference another table in your project using the following function:

select * from ${ref('my_table')}

That provides two advantages:

  • You don’t have provide the full SQL table name
  • This lists the table as a dependency to the table. When the queries are executed against the warehouse, this makes sure they are done so in the correct order.

1. Create a new file

Create a new SQL file named onboarding_dataset2 and add the following content:

select * from ${ref('onboarding_dataset')}

You will see the right sidebar updating with the name of the view, onboarding_dataset2. Click on Show compiled to see the actual query that will be run in your data warehouse. The ${ref()} function will be replaced with the actual name of the table in your data warehouse. You can also notice the dependency listed just above the query validation message.

Your compiled script will differ depending on your cloud data warehouse. This example uses Redshift.

You can get validation errors if you reference a table that hasn't yet been created in your data warehouse. The validation error comes from your data warehouse and will warn you that the table you are selecting from doesn't exist. Dataform will run your script in dependency order so you can safely ignore this error.

3. View dependency tree

The overview page (linked at the top left of the page) gives you an overview of your project. There you can see the dependency tree graph of your project containing your two materialisations and a table listing them.

This example is very simple but over time you may have have dozens or hundreds of datasets that depend on each other and the table and the graph will help you and you team keeping track of them.

Now that you have successfully published your first datasets, learn how to use version control to