Key takeaways from Fabric
Starting on May 23, 2023, a new kid on the block emerged in the Microsoft stack called Fabric. Is this simply a rebranding of Azure Synapse Analytics, or much more? According to Microsoft, this new tool focuses on simplicity by creating a unified experience leveraging a single workspace, single sign-on, single storage and single security. This blog will try to explain some key takeaways directly from the launch of Fabric in May and from the updates in the following months. After reading this blog, you will be able to briefly explain all the key innovations that this tool brings. In addition, I’m positive that your eagerness to deep dive into one of the topics will have grown greatly.
Key features Fabric
OneLake is probably the most discussed topic concerning the launch of Microsoft Fabric. It emphasizes a fully lake-centric design where all your applications have one single place where the data resides. In this regard, Onelake can be compared to an organizational OneDrive, where all the data is centralized and accessible in a folder-wise structure (depicted in image 1). The data format in this centralized storage is the open Delta Lake format, which brings several advantages. Firstly, it is an open format, providing forums and an active community to help with implementation and troubleshooting. Secondly, the different analytics engines in Fabric can access the same data by using different languages (Spark, T-SQL, KQL and Analysis Services). Thirdly, this allows third party tools that rely on this format to work with the data. Lastly, it supports ACID transactions and time travel to access earlier versions of your data.
Shortcuts can further facilitate the centralization of data in one place. Shortcuts enable you to reference data across business domains without moving or copying the data itself. Consequently, you have to pay your storage costs only once and don’t risk having multiple copies of data with slightly different characteristics. Fabric does a good job in making sure that a shortcut to an external data location like for example Amazon S3 or another Azure Data Lake Storage Gen 2 looks just like a regular file in your OneLake.
Direct lake mode can be seen as another new and better way to connect to your data with Power BI. Before Fabric, you had only two options. Firstly, import mode to literally load your data into a Power BI dataset, which can then be refreshed either manually or based on a predefined schedule. Secondly, direct query mode enables live data changes in your reports by pushing queries in real-time to the source. Now with Direct Lake mode, you can scan the files in the OneLake directly from your report, without having to query an endpoint or to duplicate the data into Power BI datasets.
Git integration and CI/CD
Another recent and great addition to Fabric is that Power BI datasets are integrated with git. Before Power BI did not have an easy-to-use version control system to store your datasets and reports. While working in a team, this could be potentially dangerous by overwriting each other’s work or not being able to go back to previous versions of a report. This is now history by leveraging branching strategies in your team and using CI/CD pipelines to push reports across multiple environments in Power BI. While this is a huge improvement, a CI/CD setup in Fabric is still lacking for the pipelines and ETL code. Hopefully, this will be updated, so it looks more like the git integration of Synapse.
Fabric really tries to position itself as a one-stop shop for all things related to data, as depicted in the image above. As you can see, components like Data Factory and Synapse are reused in Fabric, but they are now directly available as one bundle of tools. The Data Factory tab is used for orchestration and has gotten a small upgrade in its user interface, as you can see in Figure 4. In my opinion, it looks more user-friendly; testing the pipeline is no longer called debug but simply run, triggers are called schedule, and so on.
In the Synapse Data Engineering tab, you can create a lakehouse. This lakehouse is split into two sections. Files, where you can create different folders and store all kinds of formats like delta, CSV and parquet. In the other tab you can structure your data as tables. Both sections are depicted in the image below:
Next, you also have a Data Warehouse for teams that want to work in the traditional way with only SQL. It contains a SQL endpoint, so you can also connect it to your preferred tool, such as Data Studio or SSMS.
In the last few decades, the need for streaming data has grown immensely. The Synapse Real-Time Analytics tab was created to answer this need by making use of the Kusto Query Language to discover patterns and anomalies in your data.
Synapse Data Science, the last component of Fabric, also facilitates the creation of AI models by using a Notebook or VS Code IDE to train your models. In addition, you can leverage ML Flow to monitor your model and the Spark library for machine learning. Fabric, like many other Microsoft services, will also leverage the new AI features from OpenAI. With text prediction Copilot helps writing code, SQL statements, building reports, DAX queries, and so on. This is very similar to ChatGPT, where you can ask questions and get an immediate response, which can greatly speed up development.
In a lakehouse which is a key component of Fabric, the data is structured according to the medallion architecture. In short, the data is split up into three layers, a bronze for landing, silver for transformations and gold for the final tables to report on. For each layer several tools are used in Fabric. Pipelines, dataflows and notebooks are typically used in the bronze layer to load data. In the silver layer dataflows and notebooks are used to do the transformations and lastly SQL endpoints or datasets are used in the gold layer. Clearly Microsoft embraces this lake-house structure with the launch of Fabric.
With these key takeaways, you, as the reader, are now ready to set up your preferred data analytics approach in Fabric. It is also advised to keep an eye on new releases because Fabric is still in preview and thus constantly changing based on feedback and additional needs. Lastly, as always, it is very important to first check what your data needs are to create a solid data architecture. If you need any guidance in this process, don’t hesitate to contact us!
Keep an eye on our website and socials, we are going to publish some deep-dive articles on the different aspects of Fabric and what they can mean for your organization!
Note: This blog was written when Fabric was still in Public Preview. Some features may have changed at the moment of reading. Features are described at the moment of writing (Mid-October 2023).
Joachim Depovere, who has been active for two years as a data consultant at Intellus, he is always eager to dive into new data tools and topics to bring customers’ data platforms to the next level.