Configure CI/CD

Configure continuous integration/deployment workflows for your Dataform project.

Introduction

Continous Integration / Continuous Deployment (CI/CD) workflows help prevent code changes from unintentionally breaking your Dataform project.

Typically, these workflows are configured to run on every commit to your Git repository, automatically checking that the commit doesn't break your code.

Dataform CLI Docker image

Dataform distributes a Docker image which can be used to run the equivalent of Dataform CLI commands.

For most CI/CD tools, this Docker image is what you'll use to run your automated checks.

Using GitHub Actions

If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI/CD workflows. Read more about configuring workflows with GitHub Actions here.

GitHub workflows are defined in YAML and must exist in a .github/workflows directory in your Git repository. Once the workflow file is added to this directory, GitHub will automatically run it on the events you specify in the workflow configuration.

For example, in a .github/workflows/dataform.yaml file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
name: CI

on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

jobs:
  compile:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code into workspace directory
        uses: actions/checkout@v2
      - name: Install project dependencies
        uses: docker://dataformco/dataform:1.6.11
        with:
          args: install
      - name: Run dataform compile
        uses: docker://dataformco/dataform:1.6.11
        with:
          args: compile

This workflow is configured to run on every commit to the master branch and for all pull requests to the master branch.

It contains three steps:

  1. Check out the project code into the working directory
  2. Use the Dataform Docker image to run dataform install , installing project dependencies into the working directory
  3. Use the Dataform Docker image to run dataform compile , checking that the project still compiles correctly

If any of these steps fails, the workflow fails. The end result will be made visible in the GitHub UI (including logs to help you understand what, if anything, broke).

Running commands which require warehouse credentials

You may want to run Dataform CLI commands that require credentials to access your data warehouse (for example, dataform test or dataform run ). The Dataform CLI expects credentials to exist in a .df-credentials.json file. However, it would be insecure to commit that file to Git.

Fortunately, GitHub Actions have support for "secrets". A GitHub secret is configured in your GitHub project settings. Once configured, you can use this secret to decrypt your warehouse credentials as part of a GitHub Actions workflow.

  1. Create your .df-credentials.json file by following these steps
  2. Encrypt your credentials file: gpg --symmetric --cipher-algo AES256 .df-credentials.json , using a secret passphrase
  3. Commit the encrypted .df-credentials.json.gpg file
  4. Add the secret passphrase to your GitHub repository as a secret named CREDENTIALS_GPG_PASSPHRASE
  5. Edit your GitHub Actions workflow file to decrypt the credentials file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
name: CI

on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

jobs:
  compile:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code into workspace directory
        uses: actions/checkout@v2
      - name: Install NPM dependencies
        uses: docker://dataformco/dataform:1.6.11
        with:
          args: install
      - name: Decrypt dataform credentials
        run: gpg --quiet --batch --yes --decrypt --passphrase="$CREDENTIALS_GPG_PASSPHRASE" --output .df-credentials.json .df-credentials.json.gpg
        env:
          CREDENTIALS_GPG_PASSPHRASE: ${{ secrets.CREDENTIALS_GPG_PASSPHRASE }}
      - name: Execute dataform run
        uses: docker://dataformco/dataform:1.6.11
        with:
          args: run

This workflow's final step runs dataform run , which uses the decrypted warehouse credentials file.

Branch protection

CI/CD checks are intended to prevent project breakages before they happen. Therefore, we strongly recommend users to turn on branch protection for the master branch. Using branch protection, you can require that all changes to master are made through a pull request, and that those pull requests must pass the checks that you specify before they are mergeable.

What's next

Publish data tables and views

Learn how to configure, publish and document data tables in your warehouse.

SQLX

Learn about the structure and features of SQLX files.

Test data quality with assertions

Learn how to test data quality with assertions.

Declare external datasets with declarations

Learn how to declare external datasets with declarations.

Write custom SQL operations

Learn how to define custom SQL operations in Dataform.

Configure your project

Learn how to configure your Dataform project.

Power your code with JavaScript

Learn how you can use JavaScript to re-use code across your scripts and define several actions.

Organise your project with tags

Learn how to organise your project with tags.

Run unit tests on your queries

Learn how to run unit tests on your queries.

Sitemap