Publishing tables

Create a view

Create a new file in your project under the definitions/ folder, such as definitions/example.sql:

select 1 as test

This will create a view by default, that will be called example in the dataform schema defined in the dataform.json file.

Trailing semi-colons should be omitted at the end of the query.

Create a table

By default, tables are actually created as views in your warehouse. To create a copy of the query result as an actual table from the query, you can use Dataform's JavaScript configuration syntax to set the type of the query to table.

--js type("table");
select 1 as test

Configuration statements starting with --js will be stripped from the compiled query, leaving the actual final query to be executed as:

select 1 as test

Referencing other tables

Dataform provides methods that enable you to easily reference another table in your project, without having to provide the full SQL table name.

definitions/source.sql:

select 1 as sourcedata

definitions/example.sql:

select * from ${ref("source")}

The value we provide to the ref() function is the name of the file that defines the table you would like to reference, without the extension. In this case, as the source table is defined in the file source.sql, we use the string value "source".

The query will be compiled into the following SQL before it is run:

select * from "dataform_schema"."source"

Any table that is referenced in a query will be automatically added to it's dependencies. When the queries are executed against the warehouse, this makes sure they are done so in the correct order.

File names must be unique within your definitions/ folder, even across directories, as these determine the name of the table or view created within your warehouse.

Adding custom dependencies

You may want to make sure a table is created only after another table is created, but without explicitly referencing it in the query itself. For this, you can use another JavaScript configuration method called dependencies(...):

--js dependencies("some_table");
select * from ...

Multiple dependencies can also be provided in a single call:

--js dependencies(["some_table", "some_other_table"]);
select * from ...

Disable a table

To stop a table being created, you can call the disabled() method. This will keep the table as part of your graph, but stop it from executing when you run your project. This is useful if the query breaks, and you don't want your entire pipeline to fail while it's fixed.

--js disabled();
select * from ...

Executing statements before or after creating tables

To execute one or more SQL statements before or after creating a table, you can use the preOps() and postOps() configuration methods.

Granting table access with postOps()

The following example SQL file defines a post operation to grant access to the table to specific groups, after the table is created. It makes use of the self() method, that returns a fully qualified table name of the current table.

--js postOps(`grant select on ${self()} to group "allusers@dataform.co"`);
select * from ...
To use the self() method within the argument provided to postOps, you must use JavaScript's ES6 template string syntax, wrapping it in back-ticks `.

Multiple arguments can be provided to postOps and preOps, each a seperate query, which will be executed in the order given. You can also use the multi-line configuration statement (below) to make this cleaner:

/*js
postOps([
  `grant select on ${self()} to group "allusers@dataform.co"`,
  `grant select on ${self()} to group "otherusers@dataform.co"`,
]);
*/

Overriding the output schema

By default, all tables are created within the default schema chosen when initializing a project, usually dataform.

This can be overriden by changing the file name to have a fully qualified table name. For example, to create a table called example in the schema other_schema, create a file with the name definitions/other_schema.example.sql:

select 1 as test

This will override the output schema of the table to other_schema. To reference such tables, use the fully qualified name of the table, for example:

select * from ${ref("other_schema.table")}

Configuration statements

There are a few equivelant ways of using JS configuration statements covered below.

Single line statements

Each line starting with --js will be interpreted as JavaScript.

--js type("table");
--js disabled(true);

Multi-line statements

For writing JavaScript over multiple lines, create a comment block that starts with /*js and ends with */:

/*js
type("table");
disabled(true);
*/

Setting multiple properties with config()

Most settings for a table can be set in a single call to the config(...) method, and are passed in via a JSON object:

/*js
config({
  type: "table",
  disabled: true,
});
*/

Warehouse specific configuration

BigQuery

For more information on configuring BigQuery tables, such as enabling table partitioning, check out the BigQuery guide.

Redshift

For more information on configuring Redshift tables, such as sort keys and dist keys, check out the Redshift guide.

Further reading