Continous Integration / Continuous Deployment (CI/CD) workflows help prevent code changes from unintentionally breaking your Dataform project.
Typically, these workflows are configured to run on every commit to your Git repository, automatically checking that the commit doesn't break your code.
Dataform distributes a Docker image which can be used to run the equivalent of Dataform CLI commands.
For most CI/CD tools, this Docker image is what you'll use to run your automated checks.
If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI/CD workflows. Read more about configuring workflows with GitHub Actions here.
GitHub workflows are defined in YAML and must exist in a .github/workflows
directory in your Git repository. Once the workflow file is added to this directory, GitHub will automatically run it on the events you specify in the workflow configuration.
For example, in a .github/workflows/dataform.yaml
file:
1name: CI 2 3on: 4 push: 5 branches: 6 - master 7 pull_request: 8 branches: 9 - master 10 11jobs: 12 compile: 13 runs-on: ubuntu-latest 14 steps: 15 - name: Checkout code into workspace directory 16 uses: actions/checkout@v2 17 - name: Install project dependencies 18 uses: docker://dataformco/dataform:latest 19 with: 20 args: install 21 - name: Run dataform compile 22 uses: docker://dataformco/dataform:latest 23 with: 24 args: compile
This workflow is configured to run on every commit to the master
branch and for all pull requests to the master
branch.
It contains three steps:
dataform install
, installing project dependencies into the working directorydataform compile
, checking that the project still compiles correctlyIf any of these steps fails, the workflow fails. The end result will be made visible in the GitHub UI (including logs to help you understand what, if anything, broke).
You may want to run Dataform CLI commands that require credentials to access your data warehouse (for example, dataform test
or dataform run
). The Dataform CLI expects credentials to exist in a .df-credentials.json
file. However, it would be insecure to commit that file to Git.
Fortunately, GitHub Actions have support for "secrets". A GitHub secret is configured in your GitHub project settings. Once configured, you can use this secret to decrypt your warehouse credentials as part of a GitHub Actions workflow.
.df-credentials.json
file by following these stepsgpg --symmetric --cipher-algo AES256 .df-credentials.json
, using a secret passphrase.df-credentials.json.gpg
fileCREDENTIALS_GPG_PASSPHRASE
1name: CI 2 3on: 4 push: 5 branches: 6 - master 7 pull_request: 8 branches: 9 - master 10 11jobs: 12 compile: 13 runs-on: ubuntu-latest 14 steps: 15 - name: Checkout code into workspace directory 16 uses: actions/checkout@v2 17 - name: Install NPM dependencies 18 uses: docker://dataformco/dataform:1.6.11 19 with: 20 args: install 21 - name: Decrypt dataform credentials 22 run: gpg --quiet --batch --yes --decrypt --passphrase="$CREDENTIALS_GPG_PASSPHRASE" --output .df-credentials.json .df-credentials.json.gpg 23 env: 24 CREDENTIALS_GPG_PASSPHRASE: ${{ secrets.CREDENTIALS_GPG_PASSPHRASE }} 25 - name: Execute dataform run 26 uses: docker://dataformco/dataform:1.6.11 27 with: 28 args: run
This workflow's final step runs dataform run
, which uses the decrypted warehouse credentials file.
CI/CD checks are intended to prevent project breakages before they happen. Therefore, we strongly recommend users to turn on branch protection for the master
branch. Using branch protection, you can require that all changes to master
are made through a pull request, and that those pull requests must pass the checks that you specify before they are mergeable.