SQLX is a powerful extension of SQL. As it is an extension, every SQL file is also a valid SQLX file!
SQLX contains the following components:
Config: contains information on the purpose of the script, such as view
or table
.
SQL: The SQL operation(s) to be performed. In-line JavaScript or built-in functions can be injected here.
JavaScript (and in-line JavaScript): These provide all the incredible functionality of JavaScript written alongside SQL!
Built-in functions: There are various useful built-in functions that can be used, such as ref()
or self()
.
All config properties, and the config itself, are optional. See ITableConfig
in the API reference for exact options.
Anything written outside of control blocks ({}
) is interpreted as SQL. Therefore to start an SQL block, just close off any prior blocks.
JavaScript can be used within SQLX via a JavaScript block, which can then be injected into the SQL using in-line JavaScript in order to dynamically modify the query.
For example:
1js { 2 const example = "foo"; 3} 4 5SELECT * FROM ${example}
JavaScript blocks are defined in SQLX by writing js { }
.
JavaScript blocks in SQLX can be used for defining reusable functions that can be used to generate repetitive parts of SQL code.
In-line JavaScript can be used anywhere SQL is written in order to dynamically modify the query. It is injected by using ${}
, for example ${console.log("foo")}
.
Built in functions have special functionality and can be executed either within in-line JavaScript or JavaScript blocks.
For all built in functions, see ITableContext
in the API reference. Some useful examples can be found here:
ref()
enables you to easily reference another dataset in your project without having to provide the full SQL dataset name. ref()
also adds the referenced dataset to the set of dependencies for the query.
Some examples can be found here.
resolve()
works similarly to ref()
, but doesn't add the dataset to the dependency list for the query.
self()
returns the fully qualified name (database, schema and name) of the current dataset. If the database, schema, or dataset name is overridden in the config{}
block, self()
will return the full and correct dataset name.
name()
returns the name of the current dataset.
Here is an example of an incremental table using the self()
function.
Pre-operations: defined in SQLX by writing pre_operations { }
, SQL written inside will be executed before the main SQL. This can be useful for granting permissions, as can be seen in the publishing datasets guide. Actions may only include pre_operations if they create a dataset, for example with type: "table"
or type: "view"
or type: "incremental"
in their config.
Post-operations: the same as pre-operations, but defined with post_operations { }
, and runs after the main SQL.
BigQuery and SQL Data Warehouse run all operations for a file (such as pre and post operations) in the same context; the executed SQL is created by joining all operations with a semi-colon ;
. For other warehouse types, operations are run as separate queries.
This is useful for scripting, for example defining variables or UDFs in BigQuery before the create table statement.
Dataform executes all JavaScript and SQLX code in a Dataform project inside a sandboxed environment that has no access to the database, data, or network.
While this can be limiting in some cases, there are many benefits to this approach.
Dataform compilation is extremely fast, and can compile projects with hundreds of datasets and tens of thousands of lines of code in under a second. This is useful for development where you can immediately see what your code will do without waiting.
Changing your generated SQL queries based on data in the warehouse can lead to pipelines breaking even when you haven't changed code. This can be frustrating to debug, and means table structures can change for downstream consumers without notice. With Dataform and SQLX, your build steps and generated table schematas are generally fixed, making debugging and reproducibility much easier.
Third party code (such as dataform packages) shouldn't be able to read any of your data. As all project code runs in the sandbox, there is no way to read and leak your data, instead the worst a malicious package could do is cause SQL queries to execute in your warehouse.