Synapse Dedicated SQL Pool vs. Serverless SQL Pool

Microsoft’s Synapse Analytics Suite aims to provide solutions for various data usage, consumption, and exploratory situations while also ensuring that clients can control costs in a straightforward and predictable manner. To address these distinct circumstances, Microsoft has built two separate techniques to controlling and allocating computation. In this blog we will go through both techniques. 

The first technique uses the Dedicated SQL Pool or provisioned pool. It uses a node-based architecture where applications can connect and issue T-SQL to a control node. That control node is the single point of entry in Synapse SQL.   

The Serverless SQL Pool, also known as an on-demand pool, is the second approach. This is a node-based design, just as the Dedicated SQL Pool. There is a substantial difference between the Dedicated SQL Pool and the Serverless SQL Pool to which Microsoft provides all computational resources. 

The provisioned Spark Pool is a third Synapse pool that will not handle in detail in this blogpost.

Dedicated SQL Pool

The Dedicated SQL Pool is seen as the Enterprise Data Warehouse solution were the SQL pool will become the single version of the truth. 

It is build on a MPP engine, which stands for Massive Parallel Processing engine. Scaling can be done independently because the compute power and storage are decoupled.  
The control node on top is the brain of the MPP engine. It coordinates all compute activity. When you submit a T-SQL query to the SQL Pool, the control node transforms it into query’s that run against each distribution in parallel. It is also used as endpoint to connect to with e.g. SQL Server Management Studio (SSMS), Azure Data Studio or other third party tools.  
Next we have the compute nodes. They handle the query processing. Within the Dedicated SQL Pools, you can always choose to scale-up or down. Each Dedicated SQL Pool is stored as 60 distributions. Those distributions are evenly distributed to the compute nodes. Increasing the compute nodes, will decrease the number of distributions per compute node. 
E.g. In the picture below there are 4 compute nodes. Each compute node has 15 distributions. 

When we would recommend to use it:

If you are using DirectQuery in Power BI

The Dedicated SQL pool would be ideal for using DirectQuery in Power BI. In a short length of time, a Power BI user may click a lot of boxes or graphics, and before you know it, you’ve queried terabytes of data. 
Alternative to save costs the data can be imported into Power BI, and the Dedicated SQL pool can be switched off.  

When you need a consistent level of performance and cost

The size of the pools are determined by Data Warehousing Units or DWU’s. A DWU represents an abstract, normalized measure of compute resources and performance. 

We know how many compute nodes we have at our disposal and how much it will cost us per hour depending on the DWU we choose. 
E.g. DW500c has 1 compute node with 250 GB memory, and costs 6,5 EUR/hour 

If you want to optimize your compute strategy

With Dedicated SQL pools you are in the control seat.  

  • You can create multiple SQL pools for each department in your organization. 
  • With workload management you can create groups of users to query your pool with higher or lower priority. 
  • Define a resource class, the resource class uses concurrency slot to measure the resource consumption. Before a query can execute It must be able to have enough concurrency slots. When the query is finished the concurrency slot is released. E.g. DW1000c has 32 concurrency slots . Please keep in mind that some queries may require more than one concurrency slot. 
  • Scale your SQL pool up by adding more compute nodes when more compute power is needed, and scale down again when there’s less activity. Or put the pool in pause when your data is loaded into a Power BI dataset.  

Serverless SQL Pool

What’s the meaning of “Serverless” in the Serverless SQL Pool. Serverless implies that Microsoft manages and allocates the compute power of this SQL pool. Also there is no infrastructure to setup or clusters to maintain.  

The Serverless SQL Pool is created simultaneously when your create your Synapse Workspace resource in Azure. For each Synapse workspace there is a default built-in endpoint for a Serverless SQL database. 

 There is no DWU measure unit to choose from to have more or less compute power (= no scaling-up or scaling-down). Because everything is managed by Microsoft. As a result, a new payment model has been implemented. “Pay-per-query” or in other words you only pay for the amount of data that is processed.  
4,312 EUR / TB processed to be exactly.  

 It’s a node-based design based on the DDP system, or Distributed Data Processing system. In this system query’s are split into smaller query’s, and executed on the compute nodes. The control node on top utilized the distributed query engine to optimize the query’s for parallel processing. Each small query is called a “task” and represents a distributed execution unit. 

When we would recommend to use it:

To explore the data in ad-hoc modus
You can analyze CSV, Parquet, Delta and other file formats stored in Azure Storage by using simple T-SQL query’s. 

Build a Logical Data Warehouse
Create a relational structure over the raw data stored in the Azure Storage without transforming or moving data. There is no data loading, the data is available for immediate querying.  

Data transformations
The data stored in Azure Storage can be transformed using T-SQL. The result can be loaded into another data store e.g. a Dedicated SQL Pool, an Azure SQL database or the Data Lake). 

Delta Lake

In this blog we told you we won’t talk about the provisioned Spark Pools. But we don’t want to keep the possibility of querying delta lake files with a Serverless SQL Pool hidden from you!  

With Delta Lake we can update, merge and delete parquet files on the Data Lake. We’re not going to delve too deeply into the Spark Pools and will keep things simple. 

First step is to spin up an Apache Spark pool in Synapse. This can be a small pool with 4vCores and 32GB RAM with 3 to 10 nodes.  

Second we will need to convert our parquet files into the delta file format. This can be accomplished with only one line of code.

 When this command is executed a “_delta_log” folder is created. 

Then we will use SQL code to construct a table. Using “%%sql”, the SQL code is enabled.

Let’s try to query the results and perform an update on the table.

 We can test the result of the update in our notebook, but we would prefer to verify by using T-SQL in the Serverless SQL Pool. 

  • The format is now change to delta 

Dedicated SQL Pool vs. Serverless SQL Pool

Let’s have a look at how the two SQL pools vary in Synapse.

 

Dedicated SQL Pool 

Serverless SQL Pool 

Compute 

Define a compute strategy 

Compute is managed by Microsoft 

Payment Model 

Cost per hour (DWU) 

Cost per TB of data processed 

Scaling 

Manual scaling (up/down) 

Auto scaling 

Engine 

MPP-engine 

DPP-engine 

Use Cases 

Predictable performance and cost 

Ad-hoc explorations, Data Lakehousing 

Data Retrieval 

Fast: Distribution Strategy’s can be implemented (Round-robin, Hash, …), Query Caching.  

Mediocre: Results cannot be cached, data needs to be fetched from storage. 

Conclusion

If you want to start using Azure Synapse and build up a Data Warehouse from scratch or you want to migrate your on-prem DWH, we suggest to start with the Serverless SQL Pools. With Synapse Pipelines the data can be ingested into the Azure Storage or Data Lake and immediately be transformed into a parquet/delta format. With the serverless solution we can query the result directly from the Azure Storage to verify the result. Then, on top of the parquet files, a view can be created. Which can be imported into a Power BI dataset via a Power BI workspace that is linked to the Azure workspace. 

A second approach could be to process and enrich your data using Dedicated SQL Pools and offload it to your Data Lake. The Dedicated SQL Pool can be put on pause to save costs, and the Serverless Pool can take over to create views or external tables for querying, and afterwards it can be imported into Power BI.   

If you need assistance determining the appropriate data architecture in Azure Synapse, contact Lytix and we can walk you through the process. 

Pieter-Jan Serlet

BI Solution Architect @Lytix