Publishing datasets

Create a view

Create a new example.sqlx file in your project under the definitions/ folder:

config { type: "view" }
SELECT 1 AS TEST

Upon running this query a view called schema.example will be created (where schema is the Dataform schema defined in your dataform.json file).

Note that non-SQL statements are stripped during query compilation, so the final executable query in this case would be:

SELECT 1 AS TEST
Trailing semi-colons should be omitted from queries.
File names must be unique within your definitions/ folder, even across different directories, because these determine the name of the dataset or view created within your warehouse.

Create a table

To create a table, i.e. a full independent copy of the query result, set the type of the query to table.

config { type: "table" }
SELECT 1 AS TEST

Referencing other datasets

Dataform provides methods that enable you to easily reference another dataset in your project without having to provide the full SQL dataset name.

definitions/source.sql:

SELECT 1 AS sourcedata

definitions/ref_example.sql:

SELECT * FROM ${ref("source")}

In order to reference the dataset created by a source.sql file, the value that should be passed to the ref() function is "source", i.e. the name of the file defining the dataset (without the file extension).

The query will be compiled into the following SQL before it is run:

SELECT * FROM "dataform_schema"."source"

Any dataset that is referenced by a query will automatically be added to that query's dependencies. Dependency queries are always executed before dependent queries to ensure pipeline correctness.

Adding custom dependencies

If you want to manually add a dependency to a query - one that is not already explicitly referenced with ref() - you should configure the file's dependencies:

config { dependencies: [ "some_table" ] }
SELECT * FROM ...

Multiple dependencies may be provided in a single invocation:

config { dependencies: [ "some_table", "some_other_table" ] }
SELECT * FROM ...

Disable a dataset

To stop a query being run, you can disable it. This will keep the dataset as part of your graph, but stop it from executing when you run your project. This can be useful for example if the relevant query breaks for some reason and you don't want your pipeline to fail while it's being fixed.

config { disabled: true}
SELECT * FROM ...

Executing statements before or after dataset creation

You can specify pre_operations { ... } and post_operations { ... } to configure Dataform to execute one or more SQL statements before or after creating a dataset.

Example: Granting dataset access with post_operations

The following example defines a post-query operation to configure dataset access permissions. It makes use of the built-in self() method which returns the fully-qualified name of the current dataset.

SELECT * FROM ...

post_operations {
  GRANT SELECT ON ${self()} TO GROUP "allusers@dataform.co"
}

To specify multiple operations, separate them with ---:

post_operations {
  GRANT SELECT ON ${self()} TO GROUP "allusers@dataform.co"
  ---
  GRANT SELECT ON ${self()} TO GROUP "otherusers@dataform.co"
}

Overriding a dataset's schema or name

By default, a dataset's schema is set to the default schema chosen when initializing a project, usually dataform, and a dataset's name is set to the name of the corresponding file. This can be overridden on a per-file basis by specifying the schema and/or name options. For example, to create a dataset called example in the schema other_schema:

config {
  schema: "other_schema",
  name: "example"
}
SELECT * FROM ...

To reference datasets with overridden schemas, use the fully qualified name of the dataset:

SELECT * FROM ${ref("example")}
If a dataset's name is overridden, you should use that name when calling the ref() function. For example, a dataset with config { name: "overridden_name" } would be referenced using ref("overridden_name").

Warehouse specific configuration

BigQuery

For more information on configuring BigQuery datasets, such as enabling dataset partitioning, check out the BigQuery guide.

Redshift

For more information on configuring Redshift datasets, such as sort keys and dist keys, check out the Redshift guide.

SQL Data Warehouse

For more information on configuring SQL Data Warehouse datasets, such as distribution settings, check out the SQL Data Warehouse guide.

Further reading