The Journey of attaining the Azure Data Engineer certificate

Introduction

On February 23, 2021, Microsoft released a new beta certification exam, Exam DP-203: Data Engineering on Microsoft Azure. It is replacing the exams DP-200: Implementing an Azure Data Solution, and DP-201: Designing an Azure Data Solution. These previous exams DP-200 and DP-201 will retire on June 30, 2021. When passing the two old exams or passing only the new one, you earn the Microsoft certification of Azure Data Engineer Associate. I ventured into this adventure and would like to tell you all about my journey.

Emma Preparation

I planned enough time for a profound preparation preceding the exam. Sometimes it felt like studying for exams in college again, albeit a very short exam period.

The study material for this exam exists out of 10 learning paths, which you can find on Microsoft Docs. The quantity of the learning material is not to be underestimated, considered that each path exists out of multiple modules as well. For each module, an estimated time is indicated to give you an idea of how long it will take you to read and understand the theory. Each module ends with a knowledge check, where two or three multiple-choice questions test how much you remember of the topic. Various topics are addressed in the learning paths, such as Azure Data Storage, Azure Databricks, data streaming, and Azure Synapse Analytics. A big part of the learning material is dedicated to the last subject.

Next to the study material, I also did some practice exams on Udemy. On this website, example questions are provided in the form of ½ or full exams where you can test your knowledge against time. Some of the questions are the same ones as the MC questions on the end of each module, but there are also other, more thorough, or concrete questions. Normally, you can find a lot of example questions of previous exams, but since this is a new (beta) exam, the real exam questions are not known yet. Nonetheless, taking these practice tests was a very good exercise to prepare for the real exam.

The Exam

Once I processed all the study material, I made an appointment to take the exam and choose the option to take it from home. In my familiar environment, at my desk, I completed the exam with the help of the Pearson VUE technology. After uploading the necessary pictures of my id, my room, and my face, I could start the online exam while I was filmed and supervised the whole time.

My exam consisted out of 61 multiple choice questions, for which I had 120 minutes to solve. First, there were 2 case studies, followed by about fifty standalone questions. Once I had answered the questions about the case studies, this part was closed, and I could not return to this first part once I started with the next load of questions.

The content of the questions tackles different subjects, including:

  • Design and implement data storage (40-45%)
  • Design and develop data processing (25-30%)
  • Design and implement data security (10-15%)
  • Monitor and optimize data storage and data processing (10-15%).

After the exam, I had the feeling that quite some questions tackled the Synapse Analytics material, for example how to create external tables in SQL pools.

Overall, I experienced the level of the questions quite difficult and was not sure if I would pass. You need to have a score of 700 out of 1000 to pass the exam. Moreover, beta exams are not scored immediately. First, answers are gathered to check if the quality of the questions would meet the expectations. Usually, you receive your exam score about two weeks after the exam is out of its beta version. In my case, after finishing my exam, it took more than 5 suspenseful weeks before I received the long-waited email with ‘Congratulations’ as the subject.

Conclusion

Looking back, it was an interesting experience to take this beta exam. I believe there is more uncertainty associated with a beta exam than is the case with a normal certification exam because there is less information available about the learning material and the questions.

Considering the Data Engineering subject, I experienced the material very informational and there were several new things I could use and implement directly during my work. During the preparation and the exam, it helps much when there is already some work experience with different Azure services. Sometimes extensive exercises are foreseen in the learning paths to get familiar with the tools, but this is not always the case.

Another thing that was helpful for me is the practice exam questions, they give you an idea of what the exam questions will look like. Something I would really recommend to anyone who is planning to take this exam is to solve lots of example questions and practice exams. Making a lot of exercises will really help you in mastering the learning material.

Update

The exam DP-203 went live on May 4, 2021, so this certification exam is no longer in beta version.

Emma Willaert

Emma Willaert

In case you have questions regarding MS BI, don’t hesitate to contact Lytix for more information!  

Microsoft Ignite – The official Lytix Recap

Microsoft Ignite, the most prominent technology innovation event of the year, hosted by Microsoft, is just around the corner. There has been a lot going on during the past week: this 2021 edition was full of surprises and had some amazing announcements in store. Who else than Lytix to keep you up-to-date with the most recently launched releases in the world of Microsoft?

Power Automate Desktop, Microsoft’s RPA (Robotic Process Automation) tool for desktop recording, records common or repetitive interactions with applications, either in a browser or in a native application, then can uses those recordings as the basis of an automated process. PAD will be made available to Windows 10 users at no additional cost. The ideal application to automate everyday tasks and free up time for higher-value work.

Microsoft just announced its new mixed-reality platform called Microsoft Mesh that allows people in different physical locations to join and collaborate on projects from anywhere and using any device including HoloLens, PC, tablets, and smartphones.  

To help people achieve balance, collaborate smartly, and find focus wherever and whenever they need it, Microsoft announced insights and experiences that are part of Microsoft Viva, the employee experience platform built on Microsoft 365 that empowers people and teams to be their best from wherever they work. Microsoft Viva brings together communications, knowledge, learning, resources, and insights into an employee experience that integrates seamlessly into the apps you use every day, such as Microsoft Teams.

For some people, 2020 felt like the year work and learning moved home. For others, work continued to happen on-site—with the added challenges of staying safe on the job and learning new ways to connect to remote colleagues. But the reality is: 2020 is the year work and learning moved to the cloud. To cope with this change of work environment, Microsoft released a ton of brand-new features for Microsoft Teams during Ignite 2021 as well which will ship later this year for Microsoft Teams. Microsoft PowerPoint Live integration, Microsoft Connect to easier interact cross-tenant and our favorite: easily organizing interactive webinars with up to 1.000 attendees, scalable to view-only broadcasting experience for up to 20.000 participants! Take a look at how customizable registration forms will be available:

Azure Data Factory is a fully managed, easy-to-use, serverless data integration solution to ingest your on-premises, hybrid, and multi-cloud data. You probably already knew this. But what about the product feedback on User Voice? Take a look at the 10 (!) new releases for ADF, fresh on!

The Power BI Premium Per User license lowers the threshold for SME’s with a strong analytical demand willingly to opt in on premium capabilities such as XMLA endpoint support, paginated reports, AI & deployment pipelines, without the necessity of paying at least $5.000 a month. PPU will become available at a price of €16.9 per user/month. Good to know: PPU will reach general availability on the 2nd of April, meaning you can test this license out at no cost whatsoever until then! If you’re interested in the what and how, make sure to check out one our earlier posted blog.

In the slipstream of Microsoft Ignite, a unique Ignite Challenge program has just been released. The Microsoft Ignite Cloud Skills Challenge 2021 exists in 7 different challenge, each containing several modules, helping you continue skilling up your technical expertise! Not convinced yet? Well, all participants who complete at least one challenge are eligible for a free certification exam, which can be redeemed to obtain the following certifications. This offer is due on April 1st, so make sure to grab this opportunity with both hands!

Wrap up

This article summarizes the most recent Microsoft Ignite, hosted virtually. Microsoft has a lot of features, to be released later this year for Microsoft Teams, Viva, Mesh and so on. Eager in how these new technologies could be beneficial for your company, feel free to get in touch with me or any of my colleagues at Lytix.

Power BI and Spotify – A match made in heaven?

Data isn’t life. They say music is. Well, is it? Wouldn’t it be awesome to simply combine both worlds? Let’s find out if this could turn out to be a perfect marriage.

Today, with approximately 286 million users to date, including 130 million subscribers,
Spotify can be considered as one the world’s most popular music streaming services.
In a remarkable coincidence, I happen to consume their service every once in a while myself. Actually, this is an understatement as I stream music basically on a daily basis:
During travel, development, while exercising or even when writing this blog at home.

What would we like to find out?

In regards to my streaming behavior, I came up with the following questions:

> What artists did I favor the most

> What songs did I listen to the most ?

Can certain events, such as festivals, cause a change in my streaming behavior?

> When did I stream music?

What KPI’s can we define to provide us the answers with?

Of course we need to quantify these questions based on a KPI. I believe there are two ways of measuring whether I like or dislike a specific song. The first one is the time I actually listened to the song, which I will be referring to as the listening time (in minutes). The second KPI is the song count, being the number of times a song started, regardless of me manually pressing the play button or Spotify’s autoplay starting the song.

Where can we gather the data we need?

Spotify offers their users to retrieve data through their Spotify API. Recently, they launched a refined version of this API which, at the moment of writing, is still in BETA. I can only praise Spotify for how well they document their API. They offer a clear process description on how to connect to their API, written down call definitions and even providing developers a test environment console. More information can be found here.

Evaluation time!

As we now have all data in place, we can assemble our very own Spotify dashboards with Power BI. Let’s analyze my streaming behavior of the past year by trying to formulate an answer to our predefined questions. 😊

What artists did I favor the most?

Post Malone seems to be the clear winner based on my streaming behaviour, there’s no doubt there. I tend to have a preference for R&B with Tyga and Blackbear claiming a spot in my top 5 artists.

What is interesting here is that my top 5 artists slightly differs when switching between our two predefined KPI’s. Spot 1-3 are identical, regardless of the KPI we base ourselves upon. However, in terms of song count, Nickelback and Loud Luxury are my respective number 4 and 5 whereas, in terms of play time, Loud Luxury and Blackbear take in those positions. This can have two possible explanations:

1) Songs of Nickelback are simply shorter than Blackbear songs

or

2) I’m excited to hear Nickelback songs’ refrains and get bored afterwards, forcing myself to switch songs once the refrain passed by.

What songs did I listen to the most?

It wouldn’t surprise you I listened more than 150 times to Post Malone’s top songs such as A Thousand Bad Times, Sugar Wraith and Goodbyes.

Can certain events, such as festivals, cause a change in my streaming behavior?

Yes! Last August I attended the Sziget festival in Hungary. According to my streaming history, I enjoyed Post Malone’s performance so much that the following months, Post Malone skyrocketed and became my number one streaming artist.

When did I stream music?

It turns out I’m actually getting value out of my monthly subscription. The past year, I streamed 357 out of 365 days. My listening time stays more or less steady throughout the week. During the work week, I mainly stream music traveling between home and work. And if you’re interested; yes, I’m not an early bird. 😊

Wrap-Up

This article described one way of vizualising your Spotify streaming behavior with Power BI. Of course there are many more insights these reports can be enriched with. I tried to provide a high level summary of how I created this analysis. If you’re interested in the technical details or would like to get help upon solving your own data visualisation enquiries, do not hesitate to get in touch with me or any of my colleagues at Lytix.

All aboard the data train today, CHOO CHOO 🚂 

Lou Segers
Microsoft Analytics Consultant @ Lytics
Follow me on LinkedIn

Power Apps

This day in age, apps are all around us. From our phones to our laptops, even in stores and restaurants, we can sometimes find apps. An app supports business processes in consulting balances, inputting records, grouping products, activity planning and making prognoses. However, to make those apps, someone with the right expertise needs to spend time making an app and implementing it in a system that does not always fully support it. All this demands so much time that there is no way to simply make small apps for processes like keeping inventory of company assets or inputting KPI’s. Unless you use Power Apps… From making an app which allows you to order sandwiches, to making one that makes accounting easier. You can do it all with Power Apps.

Would you like to possess the skill set to develop your own App with the Power Platform?
if you can imagine it, PowerApps can build it. Low-code & No-code to the rescue to enhance your organization’s digital agility. Feel free to subscribe on this online webinar by using the following link

What is Power Apps?

Power Apps is an App-building tool that is part of the Power-platform from Microsoft. PowerApps’ goal is assisting business users in creating their own applications and significantly reducing development time. You can make applications for desktop, tablet AND mobile depending on how you want to use them. It integrates well with O365 and is even able to be implemented in Power BI reports and vice versa. This makes it a standardized platform for which account management falls under O365. Power Apps allows anyone who has an O365 within the organization to build apps in a simple way, with minimal code, for the whole organization. In just one day it is possible to make a value-added application with requirement only specific to the organization. You can link your Power App with external data sources or even with Microsoft’s ‘Common Data Service (CDS)’. Using Power Automate on top of that, you can expand the possibilities with a huge factor.
Next to assisting in basic processes in your organization, there is also an AI-builder in Power Apps. With this feature, you can scan business cards, forms, objects and text in images. Most of these features are still in preview and require you to train an AI-model.

See how we use PowerApps at Cubis & Lytix!

Common Data Services (CDS) you mean?

Think of CDS as a centralized data source for your Power App as well as other tools like Sharepoint and Power BI. However, if a CDS is not the best choice for your organization it is always possible to use other data sources.

Power Automate

Power Automate allows us to build workflows under guidance without using code and/or cloud services and makes it possible to make workflows in a small or big context. It automates time-consuming tasks with built-in AI. It falls, just like Power Apps, under the collection of the Power Platform with the added advantages of course.

Our experience

Here at Lytix we keep up with the latest technologies, this includes Power Apps. Power Apps in itself is not that difficult to work with. That’s why we at Lytix can already deliver a Power App specific for your company within 1-2 days tops!
Having said that, it is mainly the fact that this uses a standardized platform to work on and the use of the principle of “low code, no code” that makes it possible to deploy perfectly well-functioning apps within 2 days. With the recent corona outbreak, it would be possible to make a quick app with standard questions and connect it to a database for reporting within a couple of hours.
Of course, it’s not expected that someone can do advanced apps completely at the start. That’s why we offer guidance, processes and best practices to our customers while developing new apps.

Customer Testimonials

In one out of three customers, we use PowerApps. The apps we implement all have different uses: from managing sales goals, managing the calendar for marketing or creating a master data environment (… and getting rid of manual excel files).
PowerApps allows our customers to get an easy overview of how well a certain store/department/branch is doing compared to the sales goals as well as allowing them to change already existing or fill in new goals. Another app makes it possible for our customers to easily connect promotions and customer segments to each other in the database. Next to that, our customer can also add new actions and customer segments straight into their database allowing immediate reporting on it as well in case the customer so desires.

In a nutshell

Power Apps benefits your organization in the following ways:

  • Quick and easy apps to complete the company’s business processes.
  • A familiar and standardized way to develop apps makes Power Apps more accessible to employees all over the organization.
  • Low code, no code.
  • Privileging and account usage happens within the O365 platform making it a central point for account management.
  • The Common Data Service (CDS) allows the dataset to be centralized and used by other apps that are part of the Power Platform.

If you want to know more about Power Apps and the Power Platform or hire us, you can contact us through the following channels.

Blog by Tibo Vandermeersch

10 Performance tips to speed up your Power BI reports

Arriving at work and the first thing you do is opening your Power BI report and going for a coffee while waiting for the report to render. Seems familiar?

Well, it shouldn’t. Your reports should render instantly, especially if there is a large or C-level audience consulting them on a regular basis. Luckily enough there are some tips and tricks to melt your frustrations like snow in the sun.

1. Only keep necessary visuals

Every single additional visual needs time to render and will therefore slow down your report. And if, in some cases such as when relying upon bookmarks, you really see added value in enriching your report with a bunch of shapes or data visualizations, please do hide them whenever possible. As hidden visuals won’t be rendered, they do not have a single impact upon report performance.

2. Use integers instead of text values whenever possible

String values do not compress as well as integer values. By replacing string values with integer values, preferably at source level, both .pbix file size as total memory size will reduce exponentially. Examples: Adjust data types of numeric codes to integer values where possible and use 1/0 instead of TRUE/FALSE.

3. Stick to Power BI’s default visuals

Report developers are often very tempted to scroll through the list of third-party visuals, which can be found in the marketplace, and end up importing them. In some specific business scenarios, they can actually be very suitable. However, please do use them with care. The biggest performance wins I’ve ever made was accomplished by simply converting third-party visuals to default visuals.

4. Avoid both-directional relationships in your data model

The number one performance killer. A both-directional relationship between two tables actually means both tables will cross-filter each other. This has a serious drawback on report performance as every interaction is very plausible to cause a loop of filtering actions, especially when your data model consists of multiple enabled both-directional relationships. Optimal data model design is key. Alternatively, DAX’ CROSSFILTER() function can come in handy as well.

5. Understand the difference between duplicate and reference queries

Duplicate queries will simply serve as a one time copy of the M query code whereas referencing a query also pushes through any changes performed on a referenced query to the queries referring to this query. Does this impact report performance? No, but it goes without saying the latter improves query maintainability as you only need to adjust the referenced query.

6. Only load data which will be reported upon

What about the options to “enable load” and “include in report refresh”?

Many of you have probably came across these options when developing reports but don’t exactly know how to correctly interpret their function. When the enable load checkmark is unchecked, data won’t be loaded into memory. The respective query will still be available in the query tab but will not be cluttering your data modeling or report view. This is often used for intermediary queries: intermediate results you want to refer to or merge/append other queries with. There is a clear distinction with the “include in report refresh” option: when unchecking this checkmark, data simply won’t be refreshed when you click the refresh button of your report. These query fields will still be visible in your data modeling and report view if you have the “enable load” option checkmarked. This feature is perfect for preventing static data from redundantly refreshing. Additionally, incremental refresh can be configured as well for example to prevent frozen historic data from refreshing unnecessarily. This feature used to be premium exclusive but it is generally available as of the end of February 2020 for Pro users as well, so make sure to check it out!


7. Slicers can be real performance killers

These are quick wins: change your slicers to dropdown instead of list. When a user opens a report, all (non-hidden) visuals require render time: the time Power BI needs to actually load & show a data visualization. Slicer lists will last longer to render in comparison with slicer dropdowns as the former presents filter values on report initialization. The latter will only load filter values when a user triggers the dropdown.


Even more important, is the slicer visual selection. As stated before, you should try to stick to default visuals if possible instead of relying upon third-party visuals from the marketplace. However, there are some value adding visualizations in there as well. To provide you with a clear example visual significantly contributing to user experience, the market place’s portfolio includes the so-called ‘HierarchySlicer’. This visual -unsurprisingly- enables users to add multiple fields to one and the same slicer, creating a hierarchy. At the time of report creation, Power BI did not offer the possibility to add multiple fields to their default slicer.. YET. As of Power BI’s February 2020 update, the default slicer now also supports hierarchies. In fact, one of the biggest performance wins I came across existed in simply changing a the third-party ‘HierarchySlicer’ to the Power BI default variant reducing report render time with an astonishing 90%. Continue reading to learn how you can measure performance.

 

8. Overthink your data modeling technique

I’ve had fantastic experiences reporting on a dimensional model. Bottom line, creating report look and feel should only take up a small proportion of your total efforts spent. Designing the right data model will mainly decide whether your reports will behave in a performant way. I am aware there are several ways to design a data model but I can only strongly advise to stick to Kimball’s approach. Other best practices you should keep into account exist in:

> The use of surrogate keys

> The use of role-playing dimensions

> The determination of fact(s), possibly enabling parallel loads

> No snow-flaking

Please do mind I only scratched the surface of dimensional modeling here. Always try to question your decisions in your journey to the ideal data model. Critical thinking is key. In most cases, an additional pair of eyes won’t harm.

9. Split granularity to lower cardinality

To perform calculations one or more requests are sent to the storage engine where the data is stored. This could be either an external relational engine (DirectQuery) or Power BI’s storage engine, called the Vertipaq engine (Import). Due to the variety of DirectQuery connections, I’ll be focusing on the latter. In contrast to a traditional SQL database, which evaluates data row by row, the Vertipaq engine saves data into memory in a columnar format.

With this small introduction to the Vertipaq engine, I want to draw your attention to how Power BI handles its imported data: the higher the cardinality of a column, the longer Power BI will need to perform its calculations upon this data. Put in other words: try to limit the number of distinct values of an attribute.

For example, you could split datetime fields in a date and a time field. Consequently, memory load will be reduced to a minimum as both fields will have a much lower cardinality in comparison with the original datetime field. This is something very case specific but I hope you were able to grasp the idea.

10. Measure performance !

Get a clear view on how a report is currently performing so you can quantitively measure and compare the outcome of any of your taken actions. Power BI has a built-in feature, called the performance analyzer, enabling users to measure how much time is spent on rendering visuals and how much time is spent on executing DAX code. It even gives you the full extract of the DAX code, which you can further analyze in tools as DAX Studio.

Remember the difference between Power BI’s updated default slicer and the HierarchySlicer, imported from the market place? I’ve compared their performance on one of my reports resulting in the following excerpts:

Wrap-Up

The above tips and tricks are some of the actions you could opt for while tuning performance of Power BI reports. I did not have the intention to assemble a complete list. There are several more options but I can only advise to start with the ones I’ve described above. If none of them result in the performance you were hoping for, do not hesitate to get in touch with me or any of my colleagues at Lytix.

All aboard the data train today, CHOO CHOO 🚂 

Lou Segers
Microsoft Analytics Consultant @ Lytics
Follow me on LinkedIn