Dataform provides a JavaScript API for defining all actions via code instead of creating individual SQL files for each action. This can be useful in advanced use cases.
To use the JavaScript API, create a .js
file anywhere in the definitions/
folder of your project. This code will be run during project compilation.
The JavaScript API exposes functions that create any of the actions that you would otherwise define in a SQL file:
These are regular JavaScript (ES5) files that can contain arbitrary code: loops, functions, constants, etc.
The following file is a simple example of creating actions in JavaScript:
definitions/example.js
:
1publish("table1").query("SELECT 1 AS test"); 2 3declare({ 4 schema: "rawdata", 5 name: "input1" 6}); 7 8assert("assertion1").query("SELECT * FROM source_table WHERE value IS NULL"); 9 10operate("operation1").queries("INSERT INTO some_table (test) VALUES (2)"); 11 12test("test1") 13 .dataset("some_table") 14 .input("input_data", "SELECT 1 AS test") 15 .expect("SELECT 1 AS test");
This example creates a dataset, a declaration, an assertion, an operation, and a test in a single file, which if written in SQL would have required 5 seperate files.
Each of the global methods - publish()
, declare()
, operate()
, assert()
, and test()
- return an object that can be used to configure that action. The API follows a builder syntax which can be seen in the following example:
1publish("table1") 2 .query("SELECT 1 AS test") // Defines the query 3 .type("table") // Sets the query's type 4 .dependencies(["other_table"]) // Specifies dataset dependencies 5 .descriptor({ 6 test: "Value is 1" // Describes fields in the dataset 7 });
Multiple configuration properties can also be set using the config()
method, or alternatively simply passed as a second argument to the method:
1publish("table1", { 2 type: "table", 3 dependencies: ["other_table"], 4 descriptor: { 5 test: "Value is 1" 6 } 7});
Learn more about configuration options for each type of action:
publish()
(Datasets)declare()
(Declarations)assert()
(Assertions)operate()
(Operations)test()
(Tests)When writing .sqlx
files, Dataform makes a number of built-in functions such as ref()
and self()
available to use within the main query. For example:
definitions/example.sqlx1config { type: "view" } 2SELECT * FROM ${ref("other_table")}
The ref()
function is made available for the script to use automatically. Note that this is not the case when using the JavaScript API.
To use these functions in JavaScript, API methods - such as query()
- take a Contextable
argument. Instead of providing a string as the argument to the API method,
you can pass a function whose only parameter is a context
object. This object exposes the built-in functions for JavaScript code to use.
For example, the above example written in JavaScript (making handy use of JavaScript template strings), in definitions/example.js
:
1publish("example").query(ctx => `SELECT * FROM ${ctx.ref("other_table")}`);
The following methods and configuration options accept a function taking a Contextable
argument as in the above example:
query()
preOps()
postOps()
Just like in .sqlx
files, you can reference any includes
function, macro or constant inside a .js
file. However, the syntax differs slightly and varies depending on the includes
file location.
When importing from a top-level includes
file, you can simply reference the file name when declaring your variables.
For example, if you would like to reference SERVICE_NAME
and SERVICE_ID
from the includes/service.js
file, you can reference the file name directly. This is because Dataform automatically imports
all top-level includes
files.
1const {SERVICE_NAME, SERVICE_ID} = service; 2...
When importing from a nested includes
file, you must declare these constants using the JavaScript builtin require
function
For example, if you would like to reference SERVICE_NAME
and SERVICE_ID
from the includes/all_services/service.js
file, you must use require
.
1const {SERVICE_NAME, SERVICE_ID} = require("includes/all_services/service.js"); 2...
Regardless of whether the source file is top-level or nested, once declared, these variables are available without any additional notation (i.e.. you don't need to wrap the constants in ${}
).
One of the most common use cases for using the JavaScript API is to perform a similar action repeatedly.
For example, imagine you have several datasets, all of which have a user_id
field. Perhaps you would like to create a view of each dataset with certain blacklisted user IDs removed.
You could perform this action across all datasets using a JavaScript forEach()
statement:
definitions/blacklist_views.js1const datasetNames = ["user_events", "user_settings", "user_logs"]; 2 3datasetNames.forEach(datasetNames => { 4 publish(datasetNames + "_blacklist_removed").query( 5 ctx => ` 6 SELECT * FROM ${ctx.ref(tableName)} 7 WHERE user_id NOT IN ( 8 SELECT user_id 9 FROM ${ctx.ref("blacklisted_user_ids")} 10 )` 11 ); 12});
This script would create 3 new datasets as views named user_events_blacklist_removed
, user_settings_blacklist_removed
, and user_logs_blacklist_removed
that don't contain any of the blacklisted user IDs.