Snowflake, a DWH as a Service

‘Cloud’ or ‘Clowned’? With the high adoption of cloud computing in all processes, ‘Analytics’ is the one that can benefit from it the most. Analytics as being it a data warehouse that is used for reporting or ad-hoc analysis, Data Lake which serves as a repository for data that contains added value and Big Data in which one server isn’t sufficient to meet your analytical needs. Typical to analytics is combining different sources, a high variation in its use (at some days it is used highly intensive, while at others it is not consulted) and the need for lots of computational power.

As a Cloud PaaS solution Snowflake is taking the Data world by storm. This tool has a lot of benefits but is it really as good as it claims to be? Can it address the challenges listed above? This blog post will go through the most prominent advantages and disadvantages of Snowflake.

But… what is PaaS?

Before we dive into the nitty-gritty of Snowflake it is important we define what a ‘Platform as a Service’ (PaaS), is.

As the name describes, when a platform-based service is offered to be used by customers where they can execute the required developments without having to take care of the underlying infrastructure. This infrastructure is provided and maintained by the owner of the PAAS Solution.

So, everything is set for me to develop! What are the limitations then?

Well, exactly as you say. Everything is set up for you so you cannot change anything in the infrastructure. Any limitations that the Solution has, you will need to either live with or hope that it is solved by the provider.

This is an issue you don’t have when it is developed by yourself. If you need some additional infrastructure developments or changes you can plan and execute it yourself.

It is always best to look at what the PAAS solution can offer you and if it ticks all the boxes you need.

Where does snowflake fit in here?

Snowflake offers a lot of capabilities you can use during your journey through data without having the issue of taking care of your environment. All the technical set-up is done for you, hence it is PaaS.

What makes snowflake so special besides being PaaS?

We will look into a few aspects that make Snowflake shine.

  • Cloud Availability

One of the main selling points of Snowflake is the fact that is it fully cloud-based and not an on-premises system made available in the cloud.

It’s available every major cloud service on the market, meaning Azure, AWS and Google cloud. On top of this Snowflake maintains a Cross-Cloud strategy.

This strategy ensures the following:

  • The possibility to negotiate the best cloud environment without having any data-migration issues.
    • In case your chosen cloud provider faces an outage you can simply switch to another cloud provider and ensure uptime of your systems.
  • Data Storage Capacity

You often hear data is the new currency. Nothing could be more true in the business world. Companies keep collecting data and so their volumes keep growing.

Luckily Snowflake has no limit on their data storage. As the storage and compute layers are separated, you can just dump all relevant business data into SnowFlake which will then treat it as a Data Lake (= read: cheap storage and an infinite storage amount).

Of course you might not need a large amount now, but during the years to come you expect your data volumes to increase. If you are provided with a large volume of value-added data, then you do not need to hesitate and can directly add it to SnowFlake. All data residing within SnowFlake (= your data lake) is directly accessible.

Another interesting feature based on Data Storage is the fact that if you copy a database, it will not take double the space. It will recognize the records that are the same and it will only require space for different or new records.

  • Dynamic Scaling

Snowflake often uses the phrase ‘Pay for what you need’ which is exactly what the dynamic scaling is for. This is best explained with an example;

Imagine you run your daily load in the morning between 3 AM and 4 AM. At the start of the working day, a lot of users are interested in the latest insights which are available within SnowFlake (let’s say: between 8 AM and 10 AM). For the rest of the day, only one super users sends maybe one query now and then.

In more traditional cost-effective Cloud scenario’s (= not SnowFlake), you would start-up your BI system only for the moments which it is planned to use (= 3 AM and 4 AM, and 8 AM and 18 AM). Your BI environment takes a while to spin up and spin down.

SnowFlake doesn’t incur a delay in ‘spinning-up and down’ and is thus directly accessible, even if no cluster is running. We often configure it in such a way that it is immediately spins-up whenever someone sends a query and automatically spins down after one minute of inactivity. Paying only when compute is being used, will save lots of money for your organization!

“SnowFlake directly spins-up when users issue queries at the database, even at an idle state. In addition, SnowFlake is automatically paused after one minute of inactivity. This combination saves a high amount of infrastructural costs which would otherwise be billed.”

  • Instant Adjustable Processing

One of the most interesting features is the ability to change the size of your warehouse. The bigger the size, the faster big queries can be executed. This can once again be changed on the fly depending on the needs of the organization at that point in time. SnowFlake uses T-Shirt sizes (XS, S, M, L, XL, XXL, XXXL) and each size is double the size of the previous (= Medium equals twice the processing power of Small).

  • Query Memory

It is very important to optimize your query usage as much as possible since you are paying per usage. Luckily Snowflake has this covered for you!

If you execute a query now and the data doesn’t change it will be cached in Snowflake’s engine.

This means that the second time you execute the query, it will not cost you anything as you will not be querying your data directly but the cached version.

That’s quite a lot of features! Are there any others?

Yes there are. We won’t be going in detail about all of them but here are some smaller, but also interesting features.

  • Time-Travel Feature: this is the ability to view data from the past.
  • Copy-Clone: You can Copy an existing database that is directly available for use.
  • ANSI-SQL
  • You can Query directly on JSON, XML… files.
  • Streaming
  • Data Sharing
  • Active Directory Integration

Are there any aspects you would say Snowflake is not suited for?

You must keep in mind Snowflake in its core designed for handling Big Data, Data Warehouses and so on. If your interest/use case lies in Data Science we would suggest you look at tools built with this in mind. We would suggest DataBricks and Spark. More info regarding DataBricks can be found on our blog here.

Very interesting! How can I get in touch?

If you need help (development, architecture, strategy, …) on one of your BI implementations, then Lytix will help you with this. Lytix believes in the SnowFlake platform and is an expert in the advantages and disadvantages it has. Our ETL framework ‘XTL’ uses SnowFlake as a database engine and can provide you with a head-start in an implementation.

The XTL Framework & Technology Drivers