JavaScript API

Introduction

Dataform provides a JavaScript API for defining all actions via code, rather than through creating individual SQL files. This can be useful in advanced use cases.

To use the JavaScript API, create a .js file anywhere in the definitions/ folder of your project. This code will be run during project compilation.

The JavaScript API provides methods that create any of the actions that you would otherwise define in a SQL file:

  • Publish a table
  • Define custom SQL operations
  • Write tests against data

These are regular JavaScript (ES5) files that can contain arbitrary code, for loops, functions, constants etc.

Defining actions via JavaScript

The following file shows a simple example of creating the 3 primary actions via JavaScript:

definitions/example.js:

publish("table1")
  .query("select 1 as test");

assert("assertion1")
  .query("select * from source_table where value is null");

operate("operation1")
  .query("insert into some_table (test) values (2)");

This creates a table, an assertion, and an operation in a single file, which would have otherwise required 3 seperate files to be created.

Setting properties on actions

Each of the global methods, publish(), operate(), assert() returns an object that can then be used to configure that action. The API follows a builder syntax which can be seen in the following example:

publish("table1")
  .query("select 1 as test")    // Defines the main query
  .type("table")                // Set's the type of the table
  .dependecies(["other_table"]) // Add's dependencies
  .descriptor({
    "test": "Value is 1"        // Describes fields in the table
  });

Multiple properties can also be set using the config() method, or as a shorthand, passed as a second argument to the method:

publish("table1", {
  query: "select 1 as test",
  type: "table",
  dependecies: ["other_table"],
  descriptor: {
    "test": "Value is 1"
  }
});

To see all the settings that can be configured for each action type, refer to the relevant reference below:

Using built-in functions such as ref()

When writing .sql files, Dataform makes a number of built in functions such as ref() and self() available to use within the main query. For example the file definitions/example.sql:

select * from ${ref("other_table")}

The ref() function is made available automatically. This is not the case when using the JavaScript API.

To use these functions, certain methods such as query() accept a contextable argument. Instead of providing a string as the argument, a function can be provided that will be passed a context object which has all these methods available on it.

To mirror the behaviour above, you can use a contextable argument to the query() method and take advantage of JavaScript template strings:

definitions/example.js:

publish("example")
  .query(ctx => `select ${ctx.ref("other_table")}`);
In order to use the ${} syntax to call methods inline in a string, the argument must be wrapped in backticks` instead of regular quotes ".

The following methods and configuration options accept a contextable argument and be called in a similar way:

  • query()
  • where()
  • preOps()
  • postOps()

Contextable arguments can also be provided via the config() API, again mirroring the behaviour above:

publish("example")
  .config({
    query: ctx => `select ${ctx.ref("other_table")}`
  });

Creating several tables at once

One of the most common use cases for using the JavaScript API is to perform a similar action several times.

For example, imagine you have several tables, all of which have a field user_id. You would like to create a view of each table with certain blacklisted user IDs removed. You can perform this across all tables using a JavaScript forEach statement:

definitions/blacklist_views.js:

const tableNames = [
  "user_events",
  "user_settings",
  "user_logs",
  "user_friends"
];

tableNames.forEach(tableName => {
  publish(tableName + "_blacklist_removed")
    .query(ctx => `
      select * from ${ctx.ref(tableName)}
      where user_id not in (
        select user_id
        from ${ctx.ref("blacklisted_user_ids")}
      )`);
});

This would create 4 new views, user_events_blacklist_removed, etc, that don't contain any of the users with blacklisted IDs.