Dataform
This guide is for integrating Alvin with the built-in Dataform service provided by GCP.
Set up permission BigQuery Resource Editor on Alvin connected Service account and all users / servi running dataform jobs. This permission needs to be set up in the project that hosts the reservation(s) for your organization, so that the functionality for overriding a reservation can be utilized.
Any user running dataform jobs must be granted same permission, most common is to have it set to the dataform managed user, but it can be user defined, example of the dataform managed SA:
service-{project_number}@gcp-sa-dataform.iam.gserviceaccount.com
The following config block must be added to the beginning of each pre_operations block. If no pre_operations block is being used then one must be created.
pre_operations {
${when(dataform.projectConfig.vars.alvin_proxy === "true", `SET @@reservation = \`{alv_provisioned_customer_project}.alvin_udfs_{dataform_region}.udf_reservation_routing\`(SESSION_USER(), @@project_id, CURRENT_DATETIME(), '${self()}', null, '${incremental()}');`, "")}
}Replace the variables with values provided by the Alvin team:
alv_provisioned_customer_projectdataform_region
The following metadata is passed to Alvin, from the execution context to match with the historical data:
SESSION_USER() => so routing can be triggered to specific users
@@project_id => so routing can be triggered to specific projects
CURRENT_DATETIME() => so routing can be adjusted according to execution time
${self()} => dataform model identifier
${incremental()} => to distinguish between incremental executions
In case there are several environments, the variable dataform.projectConfig.vars.alvin_proxy, can be used as quick toggle on/off on routing, which can be adjusted per dataform execution invocation, example like this. This will allow development and ongoing executions by engineers working with dataform before pushing to the production branch to execute as normal and only enable Alvin for specific execution types.

Last updated