Configure your project

Learn how to configure your Dataform project.

Introduction

A Dataform project is primarily configured through the dataform.json file that is created at the top level of your project directory.

In addition, package.json is used to control NPM dependency versions, including the current Dataform version.

dataform.json

This file contains information about the project. These settings, such as the warehouse type, default schema names, and so on, are used to compile final SQL.

The following is an example of the dataform.json file for a BigQuery project:

1
2
3
4
5
6
{
  "warehouse": "bigquery",
  "defaultDatabase": "my-gcp-project-id",
  "defaultSchema": "dataform",
  "assertionsSchema": "dataform_assertions"
}

Configure default schema names

Dataform aims to create all objects under a single schema (or dataset in BigQuery) in your warehouse. This is usually called dataform but can be changed by changing the defaultSchema property to some other value. For example, to change it to mytables , update the configuration file as following:

1
2
3
4
5
{
  ...
  "defaultSchema": "mytables",
  ...
}

Control query concurrency

Dataform executes as many queries as possible in parallel, using per-warehouse default query concurrency limits. If you would like to limit the number of queries that may run concurrently during the course of a Dataform run, you can set the concurrentQueryLimit property:

1
2
3
4
5
{
  ...
  "concurrentQueryLimit": 10,
  ...
}

Enable run caching to cut warehouse costs

Dataform has a built-in run caching feature. Once enabled, Dataform only runs actions (datasets, assertions, or operations) that might update the data in the action's output.

For example, if a dataset's SQL definition and dependency datasets are unchanged (since the previous run), re-creating that dataset will not update the actual data. In this case, with run caching enabled, Dataform would not run the relevant action.

Run caching is currently only supported for BigQuery projects, and requires a @dataform/core version of at least 1.6.11 .

To enable run caching on your project, add the following flag to your dataform.json file:

1
2
3
4
5
{
  ...
  "useRunCache": true,
  ...
}

Run caching enforces some tighter compilation requirements on your project. In particular, run caching depends on Dataform having accurate information about your project's dependency graph. All dependencies must be declared explicitly with ref() or dependencies .

Actions with zero dependencies must either be changed to depend on declarations, or must explicitly declare whether or not they are hermetic, using the hermetic configuration option.

Any actions which depend on data from a source that has not been explicitly declared as a dependency should be explicitly marked as not hermetic, by setting hermetic: false on that action. This notifies Dataform that the action reads data from an undeclared dependency, and thus the action should always run.

package.json

This is a standard NPM package file which may be used to include JavaScript packages within your project.

Most importantly, your Dataform version is specified here, and can be updated by changing this file or running the npm install or npm update commands inside your project directory.

If you develop projects on Dataform Web, this is managed for you and can be largely ignored.

Updating Dataform to the latest version

All Dataform projects depend on the @dataform/core NPM package. If you are developing your project locally and would like to upgrade your Dataform version, run the following command:

1
npm update @dataform/core

If you use the dataform command line tool, you may also wish to upgrade your globally installed Dataform version:

1
npm update -g @dataform/cli

What's next

Publish data tables and views

Learn how to configure, publish and document data tables in your warehouse.

SQLX

Learn about the structure and features of SQLX files.

Test data quality with assertions

Learn how to test data quality with assertions.

Declare external datasets with declarations

Learn how to declare external datasets with declarations.

Write custom SQL operations

Learn how to define custom SQL operations in Dataform.

Power your code with JavaScript

Learn how you can use JavaScript to re-use code across your scripts and define several actions.

Organise your project with tags

Learn how to organise your project with tags.

Run unit tests on your queries

Learn how to run unit tests on your queries.

Configure CI/CD

Configure continuous integration/deployment workflows for your Dataform project.

Sitemap