Configure your project

Learn how to configure your Dataform project.

Introduction

A Dataform project is primarily configured through the dataform.json file that is created at the top level of your project directory.

In addition, package.json is used to control NPM dependency versions, including the current Dataform version.

dataform.json

This file contains information about the project. These settings, such as the warehouse type, default schema names, and so on, are used to compile final SQL.

The following is an example of the dataform.json file for a BigQuery project:

dataform.json
1{
2  "warehouse": "bigquery",
3  "defaultDatabase": "my-gcp-project-id",
4  "defaultSchema": "dataform",
5  "assertionsSchema": "dataform_assertions"
6}

All of these configuration settings are accessible in your project code as properties of the dataform.projectConfig object. For example:

definitions/my_view.sqlx
1config { type: "view" }
2select ${when(
3  dataform.projectConfig.warehouse === "bigquery",
4  "warehouse is set to bigquery!",
5  "warehouse is not set to bigquery!"
6)}

Configure default schema names

Dataform aims to create all objects under a single schema (or dataset in BigQuery) in your warehouse. This is usually called dataform but can be changed by changing the defaultSchema property to some other value. For example, to change it to mytables , update the configuration file as following:

dataform.json
1{
2  ...
3  "defaultSchema": "mytables",
4  ...
5}

Configure custom compilation variables

You may inject custom variables into project compilation:

dataform.json
1{
2  ...
3  "vars": {
4    "myVariableName": "myVariableValue"
5  },
6  ...
7}

As with project configuration settings, you can access these in your project code. For example:

definitions/my_view.sqlx
1config { type: "view" }
2select ${when(
3  dataform.projectConfig.vars.myVariableName === "myVariableValue",
4  "myVariableName is set to myVariableValue!",
5  "myVariableName is not set to myVariableValue!"
6)}

Control query concurrency

Dataform executes as many queries as possible in parallel, using per-warehouse default query concurrency limits. If you would like to limit the number of queries that may run concurrently during the course of a Dataform run, you can set the concurrentQueryLimit property:

dataform.json
1{
2  ...
3  "concurrentQueryLimit": 10,
4  ...
5}

package.json

This is a standard NPM package file which may be used to include JavaScript packages within your project.

Most importantly, your Dataform version is specified here, and can be updated by changing this file or running the npm install or npm update commands inside your project directory.

If you develop projects on Dataform Web, this is managed for you and can be largely ignored.

Updating Dataform to the latest version

All Dataform projects depend on the @dataform/core NPM package. If you are developing your project locally and would like to upgrade your Dataform version, run the following command:

1npm update @dataform/core

If you use the dataform command line tool, you may also wish to upgrade your globally installed Dataform version:

1npm update -g @dataform/cli

What's next

Publish data tables and views

Learn how to configure, publish and document data tables in your warehouse.

SQLX

Learn about the structure and features of SQLX files.

Test data quality with assertions

Learn how to test data quality with assertions.

Declare external datasets with declarations

Learn how to declare external datasets with declarations.

Write custom SQL operations

Learn how to define custom SQL operations in Dataform.

Power your code with JavaScript

Learn how you can use JavaScript to re-use code across your scripts and define several actions.

Organise your project with tags

Learn how to organise your project with tags.

Run unit tests on your queries

Learn how to run unit tests on your queries.

Configure CI/CD

Configure continuous integration/deployment workflows for your Dataform project.

Sitemap