Building your data model

Learn how to connect to a warehouse and create and publish your first dataset.

Now you have your BigQuery project and warehouse credentials, you’re going to set up your Dataform project using Dataform Web. In this step you'll create a project, connect your warehouse and build out your first dataset.

Create a Dataform project

To create a new project in Dataform:

  1. Create a Dataform account here.

  2. Follow the sign up flow:

    • We will ask you for a bit of information about yourself and what you’re hoping to achieve with Dataform.
    • As part of the sign up flow we will create a new project for you. Give it the name Dataform Tutorial .
    • If you already have an account, you can create a new project by going to the homepage and clicking New Project .
Creating a new project

Connecting a warehouse

As part of the project creation flow you'll be asked to connect to a warehouse. You’re going to use the credentials you generated in the earlier part of this tutorial to connect:

  1. On the Configure Warehouse modal click Connect .

  2. Select Google BigQuery from the drop down menu.

  3. Enter your Project ID.

    • This can be found by going to the BigQuery console and looking at the Project info box.
  4. Browse for the service account key JSON file you created in the Setting Up part of this tutorial and upload it.

  5. Check your connection is working:

    • Press Test Connection to check that the connection is working.
    • Once this is successful, you can press Save Connection .
Connecting a warehouse

Creating a dataset

Now that you've created your project and connected a warehouse, you're ready to start defining your data model.

  1. Make sure you are in a development branch:

    • By default you should already be in a development branch called yourname_dev .
    • However, if you would like to create a new branch, click Develop and select New Branch .
    • Give your branch a name.
  2. Create a new dataset:

    • Click on the New Dataset button in the left hand side bar.
    • Choose whether you want your dataset to be a table, view or incremental table. In this case we want to create a table.
    • Name the table order_stats and click Create Table .

In Dataform transformations are defined using SQLX

SQLX is an extension of SQL. A typical SQLX file contains a SELECT statement defining a new table or a view and a config block at the top. The config block is used to specify additional options for the table or view. To see a full list of options for the config block, open the Documentation tab.
  1. Define the dataset:

    • To create your table use the below query which joins orders information from Shopify, payment details from Stripe and also applies some filters to the data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT
  orders.date AS order_date,
  orders.id AS id,
  orders.customer_id AS customer_id,
  orders.status AS order_status,
  charges.status AS payment_status,
  charges.payment_method AS payment_method,
  SUM(orders.item_count) AS item_count,
  SUM(charges.amount) AS amount

FROM
  dataform-demos.dataform_tutorial.shopify_orders AS orders
  LEFT JOIN dataform-demos.dataform_tutorial.stripe_payments AS charges
    ON orders.payment_id = charges.id

WHERE
  orders.id <= 999
  AND orders.item_count > 0
  AND orders.status <> 'internal'
  AND charges.payment_method IN ('debit_card', 'subscription', 'coupon')

GROUP BY 1, 2, 3, 4, 5, 6
  • Paste the query into order_stats.sqlx below the config block.
  • Dataform will automatically validate your query and check for any errors
  • Once you see that the query is valid you can click Preview Results to check that the data looks correct
  1. Create the table in your warehouse:

    • Click Publish Table to create the table in your warehouse
    • This will take the SQLX that we’ve written, compile it into the SQL syntax of your warehouse (in this case, BigQuery), and then execute that SQL in your warehouse with the correct boilerplate code to create a table
  2. Check Run Logs:

    • You can see the progress of the run in the header menu.
    • Once you see that it has been successful, you can view Run Logs by clicking on the hamburger menu in the top left hand corner.
    • In Run Logs you can see all the past runs in the project. You can see their status, as well as how and when they were triggered.
    • For each run you can see the exact SQL run against the warehouse to create your datasets by clicking the Details button.
Run Logs

You now have a new table called order_stats which has been created in your warehouse and you're ready to add to your data model!

For more detailed info on publishing datasets in Dataform, see our [docs].(https://docs.dataform.co/guides/datasets/publish)

What's next

Getting set up

Learn how to create a new BigQuery project and generate warehouse credentials.

Managing dependencies

Learn how to use the ref function in Dataform and how to view your project in the Dependency tree.

Setting up a schedule

Learn how to set up a schedule and alerts in Dataform

Data quality tests and documenting datasets

Learn how to set up data quality tests using assertions and how to document your datasets

Committing your changes

Learn how to committ changes you've made in your Dataform project

Sitemap