Publishing datasets

Create a view

Create a new example.sql file in your project under the definitions/ folder:

SELECT 1 AS TEST

By default, upon running this query, a view called schema.example will be created (where schema is the Dataform schema defined in your dataform.json file).

Trailing semi-colons should be omitted from queries.
File names must be unique within your definitions/ folder, even across different directories, because these determine the name of the dataset or view created within your warehouse.

Create a table

By default, datasets are created as views in your warehouse. To create an actual table, i.e. a full independent copy of the query result, you can use Dataform's JavaScript configuration syntax to set the type of the query to table.

--js type("table");
SELECT 1 AS TEST

Note that configuration statements starting with --js are stripped during query compilation, so the final executable query in this case would be:

SELECT 1 AS TEST

Referencing other datasets

Dataform provides methods that enable you to easily reference another dataset in your project without having to provide the full SQL dataset name.

definitions/source.sql:

SELECT 1 AS sourcedata

definitions/ref_example.sql:

SELECT * FROM ${ref("source")}

In order to reference the dataset created by a source.sql file, the value that should be passed to the ref() function is "source", i.e. the name of the file defining the dataset (without the file extension).

The query will be compiled into the following SQL before it is run:

SELECT * FROM "dataform_schema"."source"

Any dataset that is referenced by a query will automatically be added to that query's dependencies. Dependency queries are always executed before dependent queries to ensure pipeline correctness.

Adding custom dependencies

If you want to manually add a dependency to a query - one that is not already explicitly referenced with ref() - you can use the dependencies() JavaScript configuration method:

--js dependencies("some_table");
SELECT * FROM ...

Multiple dependencies may be provided in a single invocation:

--js dependencies(["some_table", "some_other_table"]);
SELECT * FROM ...

Disable a dataset

To stop a query being run, you can use the JavaScript disabled() method. This will keep the dataset as part of your graph, but stop it from executing when you run your project. This can be useful for example if the relevant query breaks for some reason and you don't want your pipeline to fail while it's being fixed.

--js disabled();
SELECT * FROM ...

Executing statements before or after dataset creation

To execute one or more SQL statements before or after creating a dataset, you can use the preOps() and postOps() configuration methods.

Example: Granting dataset access with postOps()

The following example defines a post-query operation to configure dataset access permissions. It makes use of the built-in self() method which returns the fully-qualified name of the current dataset.

--js postOps(`GRANT SELECT ON ${self()} TO GROUP "allusers@dataform.co"`);
SELECT * FROM ...
To use the self() method within the string argument provided to postOps(), you must use JavaScript's ES6 template string syntax, wrapping the string in back-ticks: `template_string`

Multiple arguments can be provided to postOps() and preOps(), each a seperate query, which will be executed in the provided order. You can alternatively use a multi-line configuration statement:

/*js
postOps([
  `GRANT SELECT ON ${self()} TO GROUP "allusers@dataform.co"`,
  `GRANT SELECT ON ${self()} TO GROUP "otherusers@dataform.co"`
]);
*/

Overriding the output schema

By default, all datasets are created within the default schema chosen when initializing a project, usually dataform.

This can be overridden on a per-file basis by changing the file name to have a fully qualified dataset name. For example, to create a dataset called example in the schema other_schema, create a file with the name definitions/other_schema.example.sql. This will override the output schema of the dataset to other_schema. To reference datasets with overridden schemas, use the fully qualified name of the dataset:

SELECT * FROM ${ref("other_schema.table")}

Configuration statements

Each of the following examples of JavaScript configuration statements results in equivalent behavior.

Single line statements

Every line starting with --js will be interpreted as JavaScript.

--js type("table");
--js disabled(true);

Multi-line statements

To write JavaScript over multiple lines, create a comment block that starts with /*js and ends with */:

/*js
type("table");
disabled(true);
*/

Setting multiple properties with config()

Most settings for a dataset can be passed in a single JSON object using the config() method:

/*js
config({
  type: "table",
  disabled: true
});
*/

Warehouse specific configuration

BigQuery

For more information on configuring BigQuery datasets, such as enabling dataset partitioning, check out the BigQuery guide.

Redshift

For more information on configuring Redshift datasets, such as sort keys and dist keys, check out the Redshift guide.

Further reading