Interview Mattias De Coninck

Can you explain what kind of work you have been doing?

I am currently full-time staffed at a client in the retail sector. There I am taking on the role of BI consultant in order to report on both the operational and budgetary inquiries the client may have. The operational aspect mainly tackles reporting the uptime, downtime, availability, etc. of cloud resources. Here my previous experience as a software engineer really comes in handy. This because I have worked with most of the cloud resources giving me a better understanding of what other metrics might be useful for the client.

What are your specialties? 

As previously mentioned I used to be a software engineer at a big four company. This makes it so that I can easily find my way around most API integrations and that I can use code to solve some issues standard BI tools might not be able to solve. Besides the technical aspect, it also ensured that I am familiar with an agile way of working and the tools surrounding it (jira, Azure DevOps, etc.).

What does an ideal project look like for you?

My ideal project involves tackling problems in an end-to-end approach. This starts with the architecture side, looking at the problem and determining which tools or methodologies should be used. It ends at actually implementing the proposed solution in the team. This ensures that you have an understanding of the why and how.

What is an ideal day off for you? 

My days start at around 7 am, to be more accurate it should start at 7 am. I have the habit of pressing the snooze button at least once. So when I start my day at 7.15 am I immediately take a quick shower to wash off any remaining fatigue. Afterward, I grab a quick breakfast and make sure that I am in front of my laptop somewhat before 8 am. Most of the time I just navigate to Azure DevOps and start working on the first ticket in my backlog. At 12oclock I eat lunch with my girlfriend and take some time to catch up about how our days are going. Once back at the desk around 1 pm I try to have status sync with my team lead to tackle any blockers I may have encountered while working on the backlog. Once synced I work till around 6 pm until I close my laptop and start preparing dinner. On Wednesdays and Fridays, my days end with a Kyokushin karate class which is perfect to unwind at the end of the day.

What is your favorite holiday destination?

That would be Japan. I have gone there twice already but feel like I have seen so little of the country. My fascination with japan comes from the sport I practice, Kyokushin karate (full contact karate). On my first visit, I had the privilege to train at the dojo of a former world champion in Tokyo. Besides the karate aspect, one must appreciate the unique culture and climate of the country. The beautiful imperial palaces are nothing like the robust castles we are used to in Europe. All in all, I really love the country and can’t wait to visit it a third time!

Composite Models

Imagine having your Data Model ready and everyone loves it.
Then you get this question all of a sudden.

“I have made this excel file and I would like to use and combine it with the Sales Dataset in PowerBI.”

The next thing you do is analyze the request and for some reason, maybe you don’t want to flood the Datawarehouse with Excel Files or maybe your ETL team is too busy with requests, but you have to say no.

The business user is quite disappointed because it would really have solved that time-consuming task he/she had to do every week.

What if I told you that all the above can be avoided with a very simple solution?

Composite Models are the answer!

These models are a way to seamlessly combining different data sources in PowerBI.

With composite models you can connect to your Azure Analysis Service (AAS) model or Power BI dataset and enrich them even further by connecting to a second model, adding flat files, or even just the ability to build measures on top of your AAS model.

This is all super exciting for both business users and BI developers as it will create even more analytical opportunities.

Now, I know what you are thinking.
Let’s take a look at this feature!

I love your eagerness but before we dive into it, you should know that this feature has the following Limitations.

The following Live Connect multi-dimensional sources can’t be used with composite models:

  • SAP HANA
  • SAP Business Warehouse
  • SQL Server Analysis Services
  • Power BI datasets
  • Azure Analysis Services

 

As of December 2020, DirectQuery for PBI and Analysis Services is available in Public Preview.
If you would like to use Composite models with PBI Datasets and AAS you should enable this feature.

Now, with that out of the way, let’s get to the exciting part!

In our Example, we will be using a live connection to a PBI Dataset and an Imported Excel File.

First of all we will be connecting to the PBI Dataset we want to use.
Once this is done and you are all ready and eager to add a different source to the model, or to just add your own calculated columns, you will need to go to Transform Data.

You will be prompted with the above information.

As stated previously a DirectQuery Connection is required. This is the reason the DirectQuery for PBI Datasets and Analysis Services was required.

It will also notify you that this change is permanent. You cannot change this back once performed and saved. This is very important to keep in mind.

When we are ready to continue we can click on the Add to Local Model button and a DirectQuery Connection will be made.

You know it will be successful when you see the following text in the bottom right corner.

Now we can do quite a few more things more we couldn’t before to the already existing model.

For example, if you previously right clicked on a table you got the below options.

However, now we can actually create new Columns along with renaming/hiding Tables, Measures and Attributes and create relationships.

This last one will be very handy when adding different sources to your model.

Now we can actually add a different source file to our model. This can go from Excel or Sharepoint files to different PBI Datasets.

Once you add another data source you will once again be prompted with a message.

Of course, when combining different data sources through DirectQuery there is always the possibility that data can be included in queries sent from one data source to the other.

Always be certain of the sources you are pulling your data from.
Once you are sure this is no issue for you, we can continue.

Both models will now exist separately from one another and can be joined through a relationship you define if required.

Different visuals can now be built using data from both (Or more) sources.

All of this is amazing and a great leap forward for both developers and advanced business users.

PowerBI is a tool that keeps improving and each iteration brings in more capabilities to deliver high-quality dashboards to the end-user in a short amount of time.

If you are interested in getting to know more about Composite Models or other Microsoft BI-related aspects you can always contact Lytix.

 

 

The Journey of attaining the Azure Data Engineer certificate

Introduction

On February 23, 2021, Microsoft released a new beta certification exam, Exam DP-203: Data Engineering on Microsoft Azure. It is replacing the exams DP-200: Implementing an Azure Data Solution, and DP-201: Designing an Azure Data Solution. These previous exams DP-200 and DP-201 will retire on June 30, 2021. When passing the two old exams or passing only the new one, you earn the Microsoft certification of Azure Data Engineer Associate. I ventured into this adventure and would like to tell you all about my journey.

Emma Preparation

I planned enough time for a profound preparation preceding the exam. Sometimes it felt like studying for exams in college again, albeit a very short exam period.

The study material for this exam exists out of 10 learning paths, which you can find on Microsoft Docs. The quantity of the learning material is not to be underestimated, considered that each path exists out of multiple modules as well. For each module, an estimated time is indicated to give you an idea of how long it will take you to read and understand the theory. Each module ends with a knowledge check, where two or three multiple-choice questions test how much you remember of the topic. Various topics are addressed in the learning paths, such as Azure Data Storage, Azure Databricks, data streaming, and Azure Synapse Analytics. A big part of the learning material is dedicated to the last subject.

Next to the study material, I also did some practice exams on Udemy. On this website, example questions are provided in the form of ½ or full exams where you can test your knowledge against time. Some of the questions are the same ones as the MC questions on the end of each module, but there are also other, more thorough, or concrete questions. Normally, you can find a lot of example questions of previous exams, but since this is a new (beta) exam, the real exam questions are not known yet. Nonetheless, taking these practice tests was a very good exercise to prepare for the real exam.

The Exam

Once I processed all the study material, I made an appointment to take the exam and choose the option to take it from home. In my familiar environment, at my desk, I completed the exam with the help of the Pearson VUE technology. After uploading the necessary pictures of my id, my room, and my face, I could start the online exam while I was filmed and supervised the whole time.

My exam consisted out of 61 multiple choice questions, for which I had 120 minutes to solve. First, there were 2 case studies, followed by about fifty standalone questions. Once I had answered the questions about the case studies, this part was closed, and I could not return to this first part once I started with the next load of questions.

The content of the questions tackles different subjects, including:

  • Design and implement data storage (40-45%)
  • Design and develop data processing (25-30%)
  • Design and implement data security (10-15%)
  • Monitor and optimize data storage and data processing (10-15%).

After the exam, I had the feeling that quite some questions tackled the Synapse Analytics material, for example how to create external tables in SQL pools.

Overall, I experienced the level of the questions quite difficult and was not sure if I would pass. You need to have a score of 700 out of 1000 to pass the exam. Moreover, beta exams are not scored immediately. First, answers are gathered to check if the quality of the questions would meet the expectations. Usually, you receive your exam score about two weeks after the exam is out of its beta version. In my case, after finishing my exam, it took more than 5 suspenseful weeks before I received the long-waited email with ‘Congratulations’ as the subject.

Conclusion

Looking back, it was an interesting experience to take this beta exam. I believe there is more uncertainty associated with a beta exam than is the case with a normal certification exam because there is less information available about the learning material and the questions.

Considering the Data Engineering subject, I experienced the material very informational and there were several new things I could use and implement directly during my work. During the preparation and the exam, it helps much when there is already some work experience with different Azure services. Sometimes extensive exercises are foreseen in the learning paths to get familiar with the tools, but this is not always the case.

Another thing that was helpful for me is the practice exam questions, they give you an idea of what the exam questions will look like. Something I would really recommend to anyone who is planning to take this exam is to solve lots of example questions and practice exams. Making a lot of exercises will really help you in mastering the learning material.

Update

The exam DP-203 went live on May 4, 2021, so this certification exam is no longer in beta version.

Emma Willaert

Emma Willaert

In case you have questions regarding MS BI, don’t hesitate to contact Lytix for more information!  

One Year at Lytix – Data Consultant Tibo Vandermeersch

I’ve been working at Lytix for one year by now and I thought it was interesting to share what my experience was during the first year as a Data Analytics Consultant at Lytix.

Training

 

After the first day, I and other starters immediately jumped into a training program that Lytix put together. Me just starting out as a Data Analytics Consultant needed this of course.

During this training, I learned and refreshed not only hard skills that are needed to execute my job but also soft skills that are involved with consultancy.
People coming straight out of school will start in a group of starters that all follow this training simultaneously.
Most of the trainings are also given by people from Lytix themselves which give you, as a new person, already a person to contact if you need guidance or expertise in a certain field of knowledge. And people here invite you to do so as well.

The training has definitely proven to come in handy. It’s also a good way to gauge the kind of knowledge and expertise that lives in Lytix.

 

Working experience

 

Even during the training period, I was already deployed at a customer under the guidance of a more senior profile which allowed me to learn more on the job.

At Lytix they try not only to find the right person for the job at the customer but also the right customer for the person doing the job. Of course, it’s not always possible to do this but when it is possible, they think of the most suitable place for you.
I personally have been very happy with the customers I got to work with. I felt comfortable doing my job and also felt like I learned from my customer. You won’t only get feedback from your colleagues at Lytix but you often get it from your customer as well which is something I appreciate a lot.
I like to go about feedback in the following way: “There is no such thing as bad feedback. It’s just that way it’s brought over that can be bad”. Don’t be afraid to ask the customer if you have any working points. Approach it in an open-minded fashion.

Lytix gives you opportunities to learn and get experience pretty fast. This allowed me to work with the data of 7 different customers in one year. Working in these many contexts learned me to go about different kinds of expectations, understanding a dataset you don’t understand yet, different ways of working, asking the right questions in the first phases of the project, ….

 

Development

 

At Lytix there is a lot of room for development. Whether it’s something small or big. As long as it’s job-related they would stand behind your proposition if you have one.

There are regular check-in moments to figure out where your personal interests are going in terms of self-development. Next to that, they could always recommend you something in terms of development that might suit you. Being self-reflective doesn’t always mean you have to find all answers yourself to what you should do. Simply asking someone what they think would be interesting for you is also a form of that.
I am for example someone who learns best by doing something and figuring it out which gets put to good use at Lytix. I sometimes get challenged by my customers and at those moments I feel most that I am growing professionally. This gives you the opportunity to gain more knowledge.
And again. When you work with a customer that plans in a moment for personal feedback with you, I appreciate that so much when they do that. It tells you a lot in how you are fulfilling the expectations but also what you can work on for yourself. They know that I’m young and don’t know everything but they let you know what you can work on based on their experience. This allows you to also prevent mistakes.

There’s a lot to learn at Lytix. Not only about job content but also about yourself. I personally have learned that I’m better at certain things than I initially thought for example.

 

Teamwork

 

Say that you’re at a customer and they ask you about something in which you have zero experience. You can always go ahead and ask someone at Lytix if they can help you out. Eventually, you will get there.

The people at Lytix and Cubis (= Lytix’s sister company) know that not everybody can do everything. That is why it is easy to contact someone with more expertise in a certain subject. At the most recent project, a customer asked which option was better; connect a customer portal with SAP directly (and how would one do it) or connect SAP with the Power BI portal and connect the customer portal to that. Me not knowing a lot about SAP went on a search at our sister company, Cubis, to find someone who could. Within a day a meeting was scheduled and an answer was given. After a couple of days (contract negotiations included) someone from Cubis was able to implement a solution that satisfied the customer.

Next to having easy access to expertise across platforms, there are also fun activities organized to build fun and good, amicable relationships between the employees. This way everyone gets to know each other a little better and you end up knowing who has expertise in which subjects.

Apart from the things mentioned above, Lytix and Cubis have monthly internal status updates and regular knowledge sharing sessions with Lytix’s “Bits & Bokes” where someone from Lytix brings some interesting to know knowledge in a short session of 30 minutes during lunch break.

Tibo Vandermeersch

Tibo Vandermeersch

In case you have questions regarding MS BI, don’t hesitate to contact Lytix for more information!  

Feature Store

Introduction

Everyone who has already come in touch with data science, has already heard of features used in such models. One aspect that can become quite challenging, is reusing features in a consistent way, across several team members, projects and in environments. In this article, I will explain the most commonly used way to resolve these problems: a feature store.

Catch-up on the terminology used in this blog by reading …

–        Things to consider when creating a Data Lake – https://lytix.be/things-to-consider-when-creating-a-data-lake/

–        Kimball in a data lake? Come again? – https://lytix.be/kimball-in-a-data-lake-come-again/

–        Managed Big Data: DataBricks, Spark as a Service – https://lytix.be/managed-big-data-databricks-spark-as-a-service/

Data lake and data warehouse

There are no shortcuts, before thinking of data science, data storage and collection are of vital importance. The image below depicts a possible of how a data lake and data warehouse can be used to store data. Note that this is not fixed and strongly varies depending on the specific needs of the company.

Having your data decently structured allows your data profiles (data analysts, data scientists) to explore the data and investigate which features can be made, and which features are useful for your model. This phase takes place before actual industrialization of features and inevitably consists of trial and error. This is not the most ‘popular’ part of the job for a data scientist, but I still consider it as an important part, as you need to have a good understanding of the data when you are building a ML model.

 

          “A ‘Feature’ is an attribute/column/aggregation that can possibly improve the accuracy of your model. A Feature Store improves reusability of features; reducing leadtimes and duplicate logic.”

 

Feature store

Once features have been successfully identified and tested in a model, it is useful to think about industrialization. This allows the features to be reused in your own model, but can also easily be reused by other models (of your colleagues, for example).

 

Input data for a feature store

The previously mentioned data warehouse is one input of the feature store. Several operations (sum, average, …) using SQL or Python (Pandas, PySpark) can be executed on the data to create features. In addition to data coming from the data warehouse, real-time data can also be used to make features (such as interactions on your website, clicks, events, etc. ).  Of course, for exploration purposes, this data can also be stored in a data lake or data warehouse. The real-time dimension of this data will be of special use in the consumption of it in realtime models, which are discussed further on.

 

Feature store guidelines

When constructing such a feature store, I see the following important aspects that should be in place:

  • Cleaning data: it should be possible to use the features directly as input of the model. Thus it is necessary to handle missing data, normalize data (if necessary), perform dummy/one hot encoding, etc. .
  • Documentation: indicate and describe which features are present and how these are constructed. Details such as the aggregation used or the timeframe length are of big importance. When such information is unclear or unknown, the adoption of the feature store will be hard and data scientists will not know which features to use.
  • Monitoring and data validation: with monitoring, I do not only mean performance monitoring or monitoring that the load has succeeded. I also mean monitoring several characteristics of each feature, such as distribution, number of features missing, number of categories, etc. . When, all of a sudden, the characteristics of a feature change, it is very well possible that the performance of a model will not be as expected anymore (i.e. data drift which will cause model drift). In an ideal situation, a dashboard visualizing these statistics is made so all of this can easily consulted.

 

Feature store types

We can identify an offline and an online feature store. The offline feature store is used for serving large batches of features to create train/test data sets and for batch applications. The online feature data storage can be used for an online model (e.g. via a REST API). For the latter, the preservation of the realtime character of the data is especially important.

 

Offline feature store

This type of feature store consists of historical features for certain moments in the past that can be used to create training and testing datasets (e.g. training data for the years 2012-2018 and test data for 2018-2020). These features can be used ‘as is’ as input in the model.  When companies have built a rich feature store, data scientists can quickly create new models, as they can skip most of the data exploration phase. However, in reality, it remains useful to check whether additional features can be created for the specific use case of the model.  These new features can then in turn be industrialized again. As depicted in the image below, the real-time nature of features is of less importance and can be used from the data warehouse/lake (if stored there).

 

Online feature store

In an online feature store, the real-time nature of the features is important, as such feature stores are primarily used to serve real-time and / or online models. These online feature stores are mostly row-based with key-value pairs that can be retrieved with very low latency (e.g., with Redis or Cosmos DB).

Conclusion

Feature stores are of vital importance to speed up your model development and to have a mature production environment for deploying models. However, they should be constructed with significant thought, otherwise adoption in the company and the use will be easily lost. If you need any help or have questions, please contact us!

Tom Thevelein

Tom Thevelein

Big Data architect and Data Scientist

This blog is written by Tom Thevelein. Tom is an experienced Big Data architect and Data Scientist who still likes to make his hands dirty by optimizing Spark (in any language), implementing data lake architectures and training algorithms.