Publish datasets

Learn how to publish tables and views in your warehouse.

Publishing tables

Create a view

Create a new example.sqlx file in your project under the definitions/ folder:

example.sqlx
1config { type: "view" }
2SELECT 1 AS TEST

Upon running this query a view called schema.example will be created (where schema is the Dataform schema defined in your dataform.json file).

Note that non-SQL statements are stripped during query compilation, so the final executable query in this case would be:

1SELECT 1 AS TEST

For a list of all configuration options for datasets, view the ITableConfig reference documentation.

Trailing semi-colons should be omitted from queries.

Create a table

To create a table, i.e. a full independent copy of the query result, set the type of the query to table .

1config { type: "table" }
2SELECT 1 AS TEST

For a list of all configuration options for datasets, view the ITableConfig reference documentation.

Referencing other datasets

Dataform provides methods that enable you to easily reference another dataset in your project without having to provide the full SQL dataset name.

definitions/source.sqlx
1SELECT 1 AS sourcedata

definitions/ref_example.sqlx
1SELECT * FROM ${ref("source")}

In order to reference the dataset created by a source.sqlx file, the value that should be passed to the ref() function is "source" , i.e. the name of the file defining the dataset (without the file extension).

The ref() function

The use the ref() function, pass it the database, schema, and name of the dataset you're referencing. Database and schema are optional: you'll only need to specify these if there are multiple datasets in your project with the same name. Read more about the ref() function [here](/reference#ICommonContext).

The query will be compiled into the following SQL before it is run:

1SELECT * FROM "dataform_schema"."source"

Any dataset that is referenced by a query will automatically be added to that query's dependencies. Dependency queries are always executed before dependent queries to ensure pipeline correctness.

Adding custom dependencies

If you want to manually add a dependency to a query - one that is not already explicitly referenced with ref() - you should configure the file's dependencies :

1config { dependencies: [ "some_table" ] }
2SELECT * FROM ...

Multiple dependencies may be provided in a single invocation:

1config { dependencies: [ "some_table", "some_other_table" ] }
2SELECT * FROM ...

Disable a dataset

To stop a query being run, you can disable it. This will keep the dataset as part of your graph, but stop it from executing when you run your project. This can be useful for example if the relevant query breaks for some reason and you don't want your pipeline to fail while it's being fixed.

1config { disabled: true}
2SELECT * FROM ...

Executing statements before or after dataset creation

You can specify pre_operations { ... } and post_operations { ... } to configure Dataform to execute one or more SQL statements before or after creating a dataset.

Example: Granting dataset access with post_operations

The following example defines a post-query operation to configure dataset access permissions. It makes use of the built-in self() method which returns the fully-qualified name of the current dataset.

1SELECT * FROM ...
2
3post_operations {
4  GRANT SELECT ON ${self()} TO GROUP "allusers@dataform.co"
5}

To specify multiple operations, separate them with --- (using --- to separate statements in pre- and post-operations ensures compatibility across all DB engines, while ; only works with engines that support multiple statements in a single query):

1post_operations {
2  GRANT SELECT ON ${self()} TO GROUP "allusers@dataform.co"
3  ---
4  GRANT SELECT ON ${self()} TO GROUP "otherusers@dataform.co"
5}

Overriding a dataset's schema or name

By default, a dataset's schema is set to the default schema chosen when initializing a project, usually dataform , and a dataset's name is set to the name of the corresponding file. This can be overridden on a per-file basis by specifying the schema and/or name options. For example, to create a dataset called example in the schema other_schema :

1config {
2  schema: "other_schema",
3  name: "example"
4}
5SELECT * FROM ...

To reference datasets with overridden schemas, use the fully qualified name of the dataset:

1SELECT * FROM ${ref("example")}
If a dataset's name is overridden, you should use that name when calling the ref() function. For example, a dataset with config { name: "overridden_name" } would be referenced using ref("overridden_name") .

Warehouse specific configuration

BigQuery

For more information on configuring BigQuery datasets, such as enabling dataset partitioning, check out the BigQuery guide.

Redshift

For more information on configuring Redshift datasets, such as sort keys and dist keys, check out the Redshift guide.

SQL Data Warehouse

For more information on configuring SQL Data Warehouse datasets, such as distribution settings, check out the SQL Data Warehouse guide.

What's next

Incremental datasets

Learn how to configure tables that update incrementally.

Document datasets

Learn how to add data documentation for your tables and views.

Sitemap