CI/CD of Synapse Serverless Part 2/2
In the first part we’ve explored the reasons why using CI/CD is important in projects. Additionally, we’ve defined three core concepts that need to be implemented as best-practices in Synapse Serverless databases. In this part, we’ll explore how we can continue using our beloved technical tooling ‘Visual Studio’ together with Git and Python in order to implement CI/CD of Synapse Serverless.
What we want to achieve?
To be able to implement the CI/CD pipeline, we need to create one long SQL script that drops (if exists) and recreates all objects. Good to know is that a script can make use of other scripts by using SQLCMD. The referenced scripts actually don’t even need to be part of the database project, they only need to be made available on the GIT folder.
We notice following files:
- A folder ‘Python Scripts’ with two files:
- json: This is a file where you can store your credentials for manual database comparisons on Synapse Serverless. Additionally, you also define where the results of a file compare need to be stored.
- py: This is a Pythons script that can be run on-demand to pull all changes from a Synapse Serverless database into the current solution. All external tables and views will be added to your Git folder.
- sql is an auto-generated file resulting from ‘ServerlessCompare.py’. It calls other sql scripts (that are available in the solution folder, but not in the solution as such) to make the elevated environments (= Acceptance, Production) consistent to what is in source control.
- sql is a script that could contain manual alterations to the database. This is added to this solution as an example that scripted alterations to elevated environments are still possible.
The ‘DeployScript.sql’ uses SQLCMD to call other scripts that drop (if exists) and recreate objects in the Synapse Serverless environment.
Step 1: Creating a Visual Studio SQL Database Solution
Firstly, we want to create a Visual Studio SQL Database Solution. By using a Database Solution, we still have somewhat of a build to check if the referenced SQLCMD scripts do exist in the folder. If this would not be the case, then a build job would result in an error.
As we are not using .dacpac files for deployments, more advanced builds (such as datatype consistency and/or missing depending objects in views) cannot be implemented.
Step 2: Migrate to Visual Studio using the Python script
After we’ve created our database solution, we will probably want to sync this solution with a local database. You may probably already have worked on a Serverless Synapse instance, hence are in need of copying all your external tables and views to this solution.
While ‘manual’ may be an option, this is not really something we want to do every now and then.
The python script ‘Serverless Compare.py’ uses the PyODBC package and SQL to list all external tables and views in a Pandas DataFame. This DataFrame has three columns; the type of the object (a view or an external table), the name of the object, the SQL code to drop and recreate the object. Once this information has been made available in a Pandas DataFrame, we will want to be storing them in a file per object. By iterating over the Pandas object, we perform various checks:
- If the object is new to the solution, it is simply appended.
- If the object is not new, a comparison is done to only touch the file if changes actually happened (this to reproduce the pollution in Git).
- If the logic hasn’t changed, nothing happens.
- If the logic of the object has changed (e.g. new columns), the file is dropped and recreated.
Lastly, our Python script also recreates the ‘DeployScript.sln’. It is important to firstly create external tables before creating views to not result in dependency errors.
You can test if everything would go well by running the ‘DeployScript.sql’ and declaring the SQLCMD variable.
The External Table Deployments
As stated in part 1 of this blog series, you cannot simply deploy external tables that are not linking to any file. The way this is tackled in this Python script is that we firstly deploy by declaring our external tables as CETAS expressions. CETAS (Create External Table As) expressions create files in your data lake. We will be using this functionality to create a dummy file that contains just the same metadata structure as the external table you want to publish. The next ETL run will drop and recreate each external table, resulting in the desired state.
Step 3: Implementing the DeployScript in Deployment Pipelines
Lastly, we will want to include the execution of this script in the CI/CD Deployment Pipeline. This can be done by using an ‘Azure SQL Database Deployment’ task that references our SQL Script.
Voila, we’ve enabled CI/CD for a Synapse Serverless solution. While we hope Microsoft comes with a better solution, we as Lytix will continue building on this platform to integrate the wonderful technology Synapse Serverless in the DTAP environments.