Each Dataform project is a repository with a collection of JSON configuration files, SQLX files, and sometimes JS files. Dataform projects contain three types of files:
Config files let you configure your Dataform project. It includes general configuration like the type of warehouse you are using or what schema Dataform should use when creating new tables and views. It also includes configuration for schedules and more advanced use-cases like packages and environments.
Definitions is where you add the SQLX files that will define new tables, views, assertions (data quality tests) and other SQL operations that will run in your warehouse.
Includes is where you can add JS files where you define variables and functions that you can use across your entire project. You can learn more about includes on this page.
In Dataform, you develop in SQLX. SQLX being an extension of SQL, any SQL file is a valid SQLX file. A typical SQLX file will contain a SELECT statement defining a new table or a view and a config block at the top.
definitions/new_table.sqlx1 2config { type: "table" } 3 4select 5 order_date as date, 6 order_id as order_id, 7 order_status as order_status, 8 sum(item_count) as item_count, 9 sum(amount) as revenue 10 11from ${ref("store_clean")} 12 13group by 1, 2
Dataform compiles your entire project in real-time, regardless of the number of tables you define. During this step, all SQLX is converted into pure SQL, in the dialect of your data warehouse. The following actions are happening during compilation:
CREATE TABLE
or INSERT
statements are added to the code following configuration in the config blockIncludes
are transpiled into SQLref(
function is resolved onto the name of the table that will be createdcompiled.sql1create or replace table "dataform"."orders" as 2 3select 4 order_date as date, 5 order_id as order_id, 6 order_status as order_status, 7 sum(item_count) as item_count, 8 sum(amount) as revenue 9 10from "dataform_stg"."store_clean" 11 12group by 1, 2
Dataform connects to your data warehouse to run SQL commands in your data warehouse, following the order of the dependency tree.
After the run, you can consult logs to see what tables were created, if assertions passed or failed, how long each action took to complete, and other information. You can also consult the exact SQL code that was run in your warehouse.
You can use your tables for all your analytics purposes.