Use JavaScript files

Learn how to define several actions in a single Javascript file.

Introduction

Dataform provides a JavaScript API for defining all actions via code instead of creating individual SQL files for each action. This can be useful in advanced use cases.

To use the JavaScript API, create a .js file anywhere in the definitions/ folder of your project. This code will be run during project compilation.

The JavaScript API exposes functions that create any of the actions that you would otherwise define in a SQL file:

  • Datasets
  • Declarations
  • Assertions
  • Operations
  • Tests

These are regular JavaScript (ES5) files that can contain arbitrary code: loops, functions, constants, etc.

Defining actions in JavaScript

The following file is a simple example of creating actions in JavaScript:

definitions/example.js :

1publish("table1").query("SELECT 1 AS test");
2
3declare({
4  schema: "rawdata",
5  name: "input1"
6});
7
8assert("assertion1").query("SELECT * FROM source_table WHERE value IS NULL");
9
10operate("operation1").queries("INSERT INTO some_table (test) VALUES (2)");
11
12test("test1")
13  .dataset("some_table")
14  .input("input_data", "SELECT 1 AS test")
15  .expect("SELECT 1 AS test");

This example creates a dataset, a declaration, an assertion, an operation, and a test in a single file, which if written in SQL would have required 5 seperate files.

Setting properties on actions

Each of the global methods - publish() , declare() , operate() , assert() , and test() - return an object that can be used to configure that action. The API follows a builder syntax which can be seen in the following example:

1publish("table1")
2  .query("SELECT 1 AS test") // Defines the query
3  .type("table") // Sets the query's type
4  .dependencies(["other_table"]) // Specifies dataset dependencies
5  .descriptor({
6    test: "Value is 1" // Describes fields in the dataset
7  });

Multiple configuration properties can also be set using the config() method, or alternatively simply passed as a second argument to the method:

1publish("table1", {
2  type: "table",
3  dependencies: ["other_table"],
4  descriptor: {
5    test: "Value is 1"
6  }
7});

Learn more about configuration options for each type of action:

Using built-in functions

When writing .sqlx files, Dataform makes a number of built-in functions such as ref() and self() available to use within the main query. For example:

definitions/example.sqlx
1config { type: "view" }
2SELECT * FROM ${ref("other_table")}

The ref() function is made available for the script to use automatically. Note that this is not the case when using the JavaScript API.

To use these functions in JavaScript, API methods - such as query() - take a Contextable argument. Instead of providing a string as the argument to the API method, you can pass a function whose only parameter is a context object. This object exposes the built-in functions for JavaScript code to use.

For example, the above example written in JavaScript (making handy use of JavaScript template strings), in definitions/example.js :

1publish("example").query(ctx => `SELECT * FROM ${ctx.ref("other_table")}`);
Note that when using JavaScript template strings - which allow use of the ${"{}"} {" "} syntax - the string in question must be wrapped in backticks ` instead of single{" "} ' or double " quotes.

The following methods and configuration options accept a function taking a Contextable argument as in the above example:

  • query()
  • preOps()
  • postOps()

Referencing includes files

Just like in .sqlx files, you can reference any includes function, macro or constant inside a .js file. However, the syntax differs slightly and varies depending on the includes file location.

Top-Level includes

When importing from a top-level includes file, you can simply reference the file name when declaring your variables.

For example, if you would like to reference SERVICE_NAME and SERVICE_ID from the includes/service.js file, you can reference the file name directly. This is because Dataform automatically imports all top-level includes files.

1const {SERVICE_NAME, SERVICE_ID} = service;
2...

Nested includes

When importing from a nested includes file, you must declare these constants using the JavaScript builtin require function

For example, if you would like to reference SERVICE_NAME and SERVICE_ID from the includes/all_services/service.js file, you must use require .

1const {SERVICE_NAME, SERVICE_ID} = require("includes/all_services/service.js");
2...

Regardless of whether the source file is top-level or nested, once declared, these variables are available without any additional notation (i.e.. you don't need to wrap the constants in ${} ).

Example: Dynamic dataset generation

One of the most common use cases for using the JavaScript API is to perform a similar action repeatedly.

For example, imagine you have several datasets, all of which have a user_id field. Perhaps you would like to create a view of each dataset with certain blacklisted user IDs removed. You could perform this action across all datasets using a JavaScript forEach() statement:

definitions/blacklist_views.js
1const datasetNames = ["user_events", "user_settings", "user_logs"];
2
3datasetNames.forEach(datasetNames => {
4  publish(datasetNames + "_blacklist_removed").query(
5    ctx => `
6      SELECT * FROM ${ctx.ref(tableName)}
7      WHERE user_id NOT IN (
8        SELECT user_id
9        FROM ${ctx.ref("blacklisted_user_ids")}
10      )`
11  );
12});

This script would create 3 new datasets as views named user_events_blacklist_removed , user_settings_blacklist_removed , and user_logs_blacklist_removed that don't contain any of the blacklisted user IDs.

What's next

Re-use code with includes

Learn how to re-use code across your project with includes.

Sitemap