A Dataform project is primarily configured through the dataform.json
file that is created at the top level of your project directory.
In addition, package.json
is used to control NPM dependency versions, including the current Dataform version.
This file contains information about the project. These settings, such as the warehouse type, default schema names, and so on, are used to compile final SQL.
The following is an example of the dataform.json
file for a BigQuery project:
dataform.json1{ 2 "warehouse": "bigquery", 3 "defaultDatabase": "my-gcp-project-id", 4 "defaultSchema": "dataform", 5 "assertionsSchema": "dataform_assertions" 6}
All of these configuration settings are accessible in your project code as properties of the dataform.projectConfig
object. For example:
definitions/my_view.sqlx1config { type: "view" } 2select ${when( 3 dataform.projectConfig.warehouse === "bigquery", 4 "warehouse is set to bigquery!", 5 "warehouse is not set to bigquery!" 6)}
Dataform aims to create all objects under a single schema (or dataset in BigQuery) in your warehouse. This is usually called dataform
but can be changed
by changing the defaultSchema
property to some other value. For example, to change it to mytables
, update the configuration file as following:
dataform.json1{ 2 ... 3 "defaultSchema": "mytables", 4 ... 5}
You may inject custom variables into project compilation:
dataform.json1{ 2 ... 3 "vars": { 4 "myVariableName": "myVariableValue" 5 }, 6 ... 7}
As with project configuration settings, you can access these in your project code. For example:
definitions/my_view.sqlx1config { type: "view" } 2select ${when( 3 dataform.projectConfig.vars.myVariableName === "myVariableValue", 4 "myVariableName is set to myVariableValue!", 5 "myVariableName is not set to myVariableValue!" 6)}
Dataform executes as many queries as possible in parallel, using per-warehouse default query concurrency limits. If you would like to limit the number of queries that may run concurrently during the course of a Dataform run, you can set the concurrentQueryLimit
property:
dataform.json1{ 2 ... 3 "concurrentQueryLimit": 10, 4 ... 5}
This is a standard NPM package file which may be used to include JavaScript packages within your project.
Most importantly, your Dataform version is specified here, and can be updated by changing this file or running the npm install
or npm update
commands inside your project directory.
If you develop projects on Dataform Web, this is managed for you and can be largely ignored.
All Dataform projects depend on the @dataform/core
NPM package. If you are developing your project locally and would like to upgrade your Dataform version, run the following command:
1npm update @dataform/core
If you use the dataform
command line tool, you may also wish to upgrade your globally installed Dataform version:
1npm update -g @dataform/cli