Feature Store


Everyone who has already come in touch with data science, has already heard of features used in such models. One aspect that can become quite challenging, is reusing features in a consistent way, across several team members, projects and in environments. In this article, I will explain the most commonly used way to resolve these problems: a feature store.

Catch-up on the terminology used in this blog by reading …

–        Things to consider when creating a Data Lake – https://lytix.be/things-to-consider-when-creating-a-data-lake/

–        Kimball in a data lake? Come again? – https://lytix.be/kimball-in-a-data-lake-come-again/

–        Managed Big Data: DataBricks, Spark as a Service – https://lytix.be/managed-big-data-databricks-spark-as-a-service/

Data lake and data warehouse

There are no shortcuts, before thinking of data science, data storage and collection are of vital importance. The image below depicts a possible of how a data lake and data warehouse can be used to store data. Note that this is not fixed and strongly varies depending on the specific needs of the company.

Having your data decently structured allows your data profiles (data analysts, data scientists) to explore the data and investigate which features can be made, and which features are useful for your model. This phase takes place before actual industrialization of features and inevitably consists of trial and error. This is not the most ‘popular’ part of the job for a data scientist, but I still consider it as an important part, as you need to have a good understanding of the data when you are building a ML model.


          “A ‘Feature’ is an attribute/column/aggregation that can possibly improve the accuracy of your model. A Feature Store improves reusability of features; reducing leadtimes and duplicate logic.”


Feature store

Once features have been successfully identified and tested in a model, it is useful to think about industrialization. This allows the features to be reused in your own model, but can also easily be reused by other models (of your colleagues, for example).


Input data for a feature store

The previously mentioned data warehouse is one input of the feature store. Several operations (sum, average, …) using SQL or Python (Pandas, PySpark) can be executed on the data to create features. In addition to data coming from the data warehouse, real-time data can also be used to make features (such as interactions on your website, clicks, events, etc. ).  Of course, for exploration purposes, this data can also be stored in a data lake or data warehouse. The real-time dimension of this data will be of special use in the consumption of it in realtime models, which are discussed further on.


Feature store guidelines

When constructing such a feature store, I see the following important aspects that should be in place:

  • Cleaning data: it should be possible to use the features directly as input of the model. Thus it is necessary to handle missing data, normalize data (if necessary), perform dummy/one hot encoding, etc. .
  • Documentation: indicate and describe which features are present and how these are constructed. Details such as the aggregation used or the timeframe length are of big importance. When such information is unclear or unknown, the adoption of the feature store will be hard and data scientists will not know which features to use.
  • Monitoring and data validation: with monitoring, I do not only mean performance monitoring or monitoring that the load has succeeded. I also mean monitoring several characteristics of each feature, such as distribution, number of features missing, number of categories, etc. . When, all of a sudden, the characteristics of a feature change, it is very well possible that the performance of a model will not be as expected anymore (i.e. data drift which will cause model drift). In an ideal situation, a dashboard visualizing these statistics is made so all of this can easily consulted.


Feature store types

We can identify an offline and an online feature store. The offline feature store is used for serving large batches of features to create train/test data sets and for batch applications. The online feature data storage can be used for an online model (e.g. via a REST API). For the latter, the preservation of the realtime character of the data is especially important.


Offline feature store

This type of feature store consists of historical features for certain moments in the past that can be used to create training and testing datasets (e.g. training data for the years 2012-2018 and test data for 2018-2020). These features can be used ‘as is’ as input in the model.  When companies have built a rich feature store, data scientists can quickly create new models, as they can skip most of the data exploration phase. However, in reality, it remains useful to check whether additional features can be created for the specific use case of the model.  These new features can then in turn be industrialized again. As depicted in the image below, the real-time nature of features is of less importance and can be used from the data warehouse/lake (if stored there).


Online feature store

In an online feature store, the real-time nature of the features is important, as such feature stores are primarily used to serve real-time and / or online models. These online feature stores are mostly row-based with key-value pairs that can be retrieved with very low latency (e.g., with Redis or Cosmos DB).


Feature stores are of vital importance to speed up your model development and to have a mature production environment for deploying models. However, they should be constructed with significant thought, otherwise adoption in the company and the use will be easily lost. If you need any help or have questions, please contact us!

Tom Thevelein

Tom Thevelein

Big Data architect and Data Scientist

This blog is written by Tom Thevelein. Tom is an experienced Big Data architect and Data Scientist who still likes to make his hands dirty by optimizing Spark (in any language), implementing data lake architectures and training algorithms.

Connecting and Using MS Graph in Azure Data Factory

Connecting and Using MS Graph in Azure Data Factory


Companies are creating more and more data, on which they want to gain insights. One valuable source of data is data from within the company itself, from the companies’ structure. For this type of data, the MS Graph API is something to look at.

The API provides a single endpoint to access all kinds of data from within your company. That could be from Office 365 services, such as MS Teams, Outlook/Exchange, … or Enterprise Mobility and Security services, such as Azure Active Directory (Users, groups, …) and Identity Manager, or data from Dynamics 365. This allows MS Graph to be used for many purposes. Once the API is implemented in a pipeline, only minor adjustments must be made if you want to use the API for another problem.

In this post, we will explain how to use the MS Graph API in Azure Data Factory. We will do this in several parts. First, we will explain how the API authentication is done. Then we will describe how to complete an App Registration and how to store its Client Secret in an Azure Key Vault. Finally, the different steps of building the pipeline in Data Factory will be explained in detail. Before we get to all this, a brief overview of a user-friendly tool to start exploring the API will be discussed.

MS Graph Explorer

Although the information about this API on the internet can seem overwhelming due to its many features, Microsoft provides a very nice tool to start playing with the API. MS Graph Explorer allows a user to get to know the API. Especially for users who have no experience with APIs, this tool can be very convenient. MS Graph Explorer offers a long list of sample queries, which the user can easily click on to get the desired data. After selecting the query, the request from the API will be displayed and a set of sample data for that particular query will be returned. MS Graph Explorer also allows you to log in with your personal Office 365 credentials, so the queries will return actual data of your organization.
If you want to check out this tool, you can go to https://developer.microsoft.com/en-us/graph/graph-explorer.

MS Graph Authentication

The authentication of the MS Graph API will be done with Client Credentials. For this, an App Registration is required. It should be noticed that it is best practice to do the authentication with a Managed Service Identity, since this makes it easier for an administrator to manage everything. When using Managed Service Identity, the required permissions need to be assigned. However, MS Graph currently does not allow to do this using Azure Portal, so this needs to be done with a PowerShell script. While this is not hard, there is decided to do the authentication with Client Credentials in this post, as it will allow users with no PowerShell experience to complete all the steps.

App Registration

Before we can use the API, we need something that will authorize the API to access the requested data. Azure App Registrations offers an easy way to do this.

  • In Azure Portal, go to ‘App Registrations’, and make a new registration.
  • When a new App Registration is created, open it and go to ‘API Permissions’ in the left ribbon.


  • Here we see an overview of the permissions of the App. By default, the App Registration has ‘User Read’ permission in MS Graph. This will not be sufficient, so new permissions need to be added. This is done by clicking on the ‘Add a Permission’ button.
  • In the window that opens, we click on ‘Microsoft Graph’, followed by ‘Application Permissions’. Now we get an overview of the possible permissions that can be granted. In our use-case we need a ‘Read All’ on the Groups and on the Users. Select this and click on ‘Add Permission’ below in the window.
  • An administrator needs to grant these permissions. Once this is done, the permissions should look like the ones below.
  • Before we can continue, we need two things. First, we need the ‘Application Client Id’. This can be found in the App Registration Overview. Copy this value in a temporary document, as it will be needed in a later step.

Second, we need the ‘Client Secret’. For this, we go to ‘Certificates & Secrets’ in the left ribbon, and click on ‘New client secret’. Give the Client Secret a name, set an expiration date and click on ‘Add’.

  • When the Client Secret is created, copy the Value. This value will only be visuable for a few minutes. After that time, it cannot be copied anymore, and a new Client Secret must be created

Key Vault

We do not want to place both the Application Client Id and the Client Secret hard coded in the Azure Data Factory pipeline. To make it more secure, we will store the Client Secret in an Azure Key Vault. It is possible to store the Application Client Id in the Key Vault as well, but that will not be addressed in this case.


  • Go to you Azure Key Vault. If you do not have a Key Vault yet, you can easily create a new one in Azure Portal.
  • In the Key Vault, go to ‘Secrets’ in the left ribbon, and click on ‘Generate/Import’. In the window that opens, fill in a Name and the Value. The Value is the Client Secret that we copied from the App Registration.
  • When the Secret is created, click on the Secret, followed by clicking on the current version. A window will open showing the ‘Secret Identifier’. Copy this temporarily in a document for later.

Azure Data Factory

Once the previous steps are done, we can start building the ADF Pipeline where the API Call will be done.
The pipeline we will build is structured as follows. It will start with a web activity where the App Registration Client Secret is retrieved from the Key Vault (GetAppRegSecret), followed by another web activity that will generate a Bearer Token (GetBearerToken). This Bearer Token is needed to authenticate the API in the next two copy activities. The first copy activity will call the MS Graph API for data of Azure AD Groups (GetADGroupsData), the second one will call the API for Azure AD Users data (GetADUsersData). The retrieved data will be moved to a Blob storage. After this, a lookup activity will collect the group data (LookupADGroups). Finally, for each group, a copy activity will transfer the unique ids of the members to the Blob storage (ForEachGroup). Each activity will be discussed more in detail below.

  • Open Azure Data Factory. Again, if the resource has not yet been created, this can easily be done in the Portal.


  • Create a new Web activity that will retrieve the App Registration Client Secret. In the ‘General’ tab, set the ‘Secure output’ to true. When the input or output of an activity is set to secure, it will not be logged for monitoring.
  • In the ‘Settings’ tab, set the URL to the secret identifier that was copied from the Key Vault. The Method is a ‘GET’. The Authentication should be set to ‘MSI’ and the Resource will be ‘https://vault.azure.net’.
  • Now we will add the Web activity that generates the Bearer token. For this activity, set ‘Secure Output’ and ‘Secure Input’ both to true. In the ‘Settings tab’, the URL should be ‘https://login.microsoftonline.com/<TenantID>/oauth2/v2.0/token’. Your TenantId can be found in Azure Portal. The Method is ‘POST’ and for the header we add one with the Name ‘Content-Type’ and Value ‘application/x-www-form-urlencoded’. In the Body we add dynamic content which can be seen below. In the body, the client_id is the Application Client Id of the App Registration. The scope is ‘https%3A%2F%2Fgraph.microsoft.com%2F.default’, the grant_type is ‘client_credentials’ and the client_secret is the output of the previously created activity.
  • Before we can add the Copy activities, we need to create a new dataset. In the left ribbon, go to Datasets and click on ‘New dataset’. Choose REST as data store.
  • For the Linked service, click on ‘New’.
  • Give the linked service a name, set the Base URL to ‘https://graph.microsoft.com’ and set Authentication type to ‘Anonymous’.
  • Select this Linked service for the new dataset, create a new dataset parameter and set the Relative URL to this parameter.
  • Now go back to the pipeline and add a Copy activity for the Azure AD Groups data. Set ‘Secure Input’ to true. In the Source tab, select the newly created dataset as ‘Source dataset’. For the RelativeUrl parameter, enter ‘v1.0/groups’. Add an Additional header with Name ‘Authorization’ and a Value with dynamic content, such that the value equals ‘Bearer <BearerToken>’. The BearerToken is the output of the previous web activity. Finally, add a Pagination rule with Name ‘AbsoluteUrl’ and Value ‘$[‘@odata.nextLink’]’. The MSGraph API allows a maximum of 999 objects per call. If there are more objects, a link will be added for a next call. ADF allows to solve this problem with Pagination rules. It should be noted that this Pagination rule can be nice to solve a pagination problem, but it is currently not very flexible to change. If the sink is a JSON file, this Pagination rule can cause problems. For this reason, the sink will be set to a parquet file, which will not give any problems with the pagination. For more info about the pagination, you can go to https://docs.microsoft.com/en-us/graph/paging
  • In the Mapping tab, select the columns you want to copy. It is important to check the box of ‘Collection reference’ for the Value.
  • Do the same for the Copy activity of data of the Users. But change the RelativeUrl field in the source to ‘v1.0/users’.
  • Now we add a Lookup activity. This activity will lookup all the groups with their group id. With this id, we can get for each group the members with the API request https://graph.microsoft.com/v1.0/groups/<group_id>/members
  • For the Lookup activity, fill in the required settings. The Source dataset is the sink of the previous copy activity. Set ‘First row only’ to false.
  • After the Lookup, add a ForEach activity to the pipeline. In the Settings, set the Items to the output value of the Lookup.
  • The final activity to add is a Copy activity in the ForEach loop. The settings of this activity are the same as the other two Copy activities, but for the RelativeUrl, we add dynamic content that we will use the Id of each group in the ForEach loop.
  • Now link the different activities like the image below.
  • The final step is executing the pipeline and start playing with the freshly retrieved data.


Although this blog only covered Azure AD data, you can easily access many more insights from your Office365 products with MS Graph. This is a big advantage of MS Graph, as it offers a single endpoint for multiple Microsoft services. All of this data can be easily obtained by changing a few parameters in the pipeline described above. This makes the MS Graph API an easy solution to multiple problems. A final point to note is that MS Graph can be used in multiple ways. Here it is used in Azure Data Factory, but the API can be called in a Python script as well.

If you have any questions about implementing this solution or how MS Graph can be used in your company, feel free to contact us.

About the Author: Jasper Puts


Jasper Puts is a Data Professional at Lytix. He can be found optimizing a customer’s analytics architecture and improving the accuracy of data science algorithms.

Microsoft Ignite – The official Lytix Recap

Microsoft Ignite, the most prominent technology innovation event of the year, hosted by Microsoft, is just around the corner. There has been a lot going on during the past week: this 2021 edition was full of surprises and had some amazing announcements in store. Who else than Lytix to keep you up-to-date with the most recently launched releases in the world of Microsoft?

Power Automate Desktop, Microsoft’s RPA (Robotic Process Automation) tool for desktop recording, records common or repetitive interactions with applications, either in a browser or in a native application, then can uses those recordings as the basis of an automated process. PAD will be made available to Windows 10 users at no additional cost. The ideal application to automate everyday tasks and free up time for higher-value work.

Microsoft just announced its new mixed-reality platform called Microsoft Mesh that allows people in different physical locations to join and collaborate on projects from anywhere and using any device including HoloLens, PC, tablets, and smartphones.  

To help people achieve balance, collaborate smartly, and find focus wherever and whenever they need it, Microsoft announced insights and experiences that are part of Microsoft Viva, the employee experience platform built on Microsoft 365 that empowers people and teams to be their best from wherever they work. Microsoft Viva brings together communications, knowledge, learning, resources, and insights into an employee experience that integrates seamlessly into the apps you use every day, such as Microsoft Teams.

For some people, 2020 felt like the year work and learning moved home. For others, work continued to happen on-site—with the added challenges of staying safe on the job and learning new ways to connect to remote colleagues. But the reality is: 2020 is the year work and learning moved to the cloud. To cope with this change of work environment, Microsoft released a ton of brand-new features for Microsoft Teams during Ignite 2021 as well which will ship later this year for Microsoft Teams. Microsoft PowerPoint Live integration, Microsoft Connect to easier interact cross-tenant and our favorite: easily organizing interactive webinars with up to 1.000 attendees, scalable to view-only broadcasting experience for up to 20.000 participants! Take a look at how customizable registration forms will be available:

Azure Data Factory is a fully managed, easy-to-use, serverless data integration solution to ingest your on-premises, hybrid, and multi-cloud data. You probably already knew this. But what about the product feedback on User Voice? Take a look at the 10 (!) new releases for ADF, fresh on!

The Power BI Premium Per User license lowers the threshold for SME’s with a strong analytical demand willingly to opt in on premium capabilities such as XMLA endpoint support, paginated reports, AI & deployment pipelines, without the necessity of paying at least $5.000 a month. PPU will become available at a price of €16.9 per user/month. Good to know: PPU will reach general availability on the 2nd of April, meaning you can test this license out at no cost whatsoever until then! If you’re interested in the what and how, make sure to check out one our earlier posted blog.

In the slipstream of Microsoft Ignite, a unique Ignite Challenge program has just been released. The Microsoft Ignite Cloud Skills Challenge 2021 exists in 7 different challenge, each containing several modules, helping you continue skilling up your technical expertise! Not convinced yet? Well, all participants who complete at least one challenge are eligible for a free certification exam, which can be redeemed to obtain the following certifications. This offer is due on April 1st, so make sure to grab this opportunity with both hands!

Wrap up

This article summarizes the most recent Microsoft Ignite, hosted virtually. Microsoft has a lot of features, to be released later this year for Microsoft Teams, Viva, Mesh and so on. Eager in how these new technologies could be beneficial for your company, feel free to get in touch with me or any of my colleagues at Lytix.

Lesson learned at the customer: Merging two tables to calculate On Time In Full (OTIF). [Part III]

In this final blog, we discuss the measures that have to be made.

Notethis is the third part of the blog. Missed the second part? Read it here.


From here on out we will continue with all the calculations that have to take place to get to the end result of the time aspect of OTIF.

Step 1

Still in the transformation window and having our new merged table selected, we will start of with adding a new column to the table.
Under the tab “Add Columns” we select “Custom Column”.


The first custom column that we are making is one where we calculate a date difference between the Requested Delivery Date and Actual Delivery Date.



Step 2

Repeat step 1. But now we are going to calculate an OnTime flag.


Step 3

Lastly, we are going to calculate the measures for the time aspect.
I split up the measures since this allows for cleaner code in general. We will only need 2 measures to calculate our OTIF time aspect. Be aware that these measures are probably not the exact answer to your problem. These are context and data model dependent but the ones you see here can help you on your way.

And there you go! You have now successfully calculated your OTIF time aspect values!
Below you can find an image of how we situated our table in the model.

In order to apply filtering on our new table, we placed the merged table between our Orders and Deliveries tables in our model with a one to many, single directional relationship to Deliveries and a one to many bidirectional relationship to Orders.

The reason for the bidirectional relationship is so that we are able to filter the table with our calendar dimension. Right now filtering on the calendar table will filter the Orders table which in it’s turn will filter the SalesDocumentSalesItemKey column in our Merge table but this could very well be a different column to your liking.

Tibo Vandermeersch

Tibo Vandermeersch

Tibo is a BI consultant with a mere focus on Power BI and data modelling. His interests mainly lie in DAX writing, development of Paginated Reports and Power Apps.

Lesson learned at the customer: Merging two tables to calculate On Time In Full (OTIF). [Part II]

Bringing everything together in one table

Notethis is the second part of the blog. Missed the first part? Read it here.

Step 1

Select the new Orders table you just made and click “Append Queries as New”


Now append the newly created Orders table with the newly created Deliveries table

If  you have more than 2 tables to append, select “Three or more tables” at the top of the Append window.

Step 2

Remove all columns except for the key column (SalesDocumentSalesItemKey).

Now you have one column containing the values of SalesDocumentSalesItemKey from both the Orders table and Deliveries table.

Step 3

It is expected to now have duplicates in this new appended column so we remove the duplicates again from this column. This leaves us with a table that contains unique key values that allow us to merge with the new Orders and Deliveries tables to extract the needed columns and their related values.


Step 4

Make sure you are now on the appended table and merge the appended table with the new Orders table to add the Requested Delivery Date column.

It’s not necessary to make a new table from this unless you want to segment your steps in different tables.

Go ahead and select the first table you want to merge with. In this case we will merge with the new Orders table.

Select the two key columns over which the join relationship will travel. Here we select SalesDocumentSalesItemKey for both tables.

After doing this you still need to make sure that the correct kind of join is selected. We chose for “Left Outer (all from first, matching from second)”.

Step 5

Right now your values are joined and you can expand the table again. It’s handy to include all columns so you can later on verify whether your join was made correctly. It’s good to check the box for “Use original column name as prefix” so you later know where the column comes from and which ones you can delete at the end.

Step 6

Repeat step 1-5 but with the new Deliveries table instead of Order.

Step 7

Now the columns from the new Orders table and the new Deliveries table are added in one big table.
Check if the values look correct. To be certain that you don’t have duplicates you can use the Group By again on the SalesDocumentSalesItemKey column (from the append step) to verify. Don’t forget to delete this navigation step.

Step 8

Remove the SalesDocumentSalesItemKey columns that you merged from the Orders and Deliveries tables and rename the date columns that are left to something more easy to work with.


Step 9

Since you have two tables now that won’t be used for reporting nor will they need to be added to the model, it’s good to uncheck “Enable Load” for these tables.

This blog will continue in part 3 where we discuss the measures that have to be made.

Tibo Vandermeersch

Tibo Vandermeersch

Tibo is a BI consultant with a mere focus on Power BI and data modelling. His interests mainly lie in DAX writing, development of Paginated Reports and Power Apps.

Lesson learned at the customer: Merging two tables to calculate On Time In Full (OTIF).

It’s always fun to make your customers happy and learn new stuff by simply just doing your job. That’s what we did at one of my customers lately. Us and a colleague (at the customer) figured out a way in Power BI to calculate the OTIF over multiple fact tables. For both beginner and intermediate profiles we thought this would be interesting to share.

And so we did! In this blogpost we will explain the process we went through to make the OTIF calculation happen.

What is OTIF?

  • To start with let us first explain what OTIF is for those who don’t understand it yet. “OTIF” stands for “On Time In Full”. This is a measure to see how many of your orders are delivered at the correct requested delivery date and how many of your orders are delivered in correct quantities.
    This means that there are two aspects, both represented as a percentage of the total SalesDocumentSalesItemKeys (in our situation), that have to be measured before being able to follow up on an OTIF analysis:

    • Time aspect
    • Quantity aspect

    The quantity aspect is simple. You compare the ordered quantity to the delivered quantity and if there is no difference, you count this as an “In Full delivery”.

    The Time aspect of the whole ordeal is less simple. We have to figure out a way to compare the Requested Delivery Date, situated in the Orders Table, to the Actual Delivery Date, which is the normal date column in our Deliveries Table.


The problem statement

Initially one would just make a measure to calculate the difference between the Requested Delivery Date from Orders and the Actual Delivery Date from Deliveries on the spot. Though this was an option, it wouldn’t have been the best one due to the mere size of the dataset and the fact that we are calculating this over multiple fact tables. That’s why we thought it was best to move this calculation to the transformation part of Power BI and use Power Query to go about calculating the Time aspect for OTIF.


The solution

The first thing we did was go to the transformation part of Power BI and reference (not duplicate) the Orders and Deliveries tables In order to make sure that any changes to the tables only have to be made in the original and not twice, once in the original and once in the copy.
These tables are copied because we will eventually bring these tables back to the columns that we will eventually need to bring together in one table. These columns are the following: 

  • From Orders:
    • SalesDocumentSalesItemKey
    • Requested Delivery Date
  • From Deliveries
    • SalesDocumentSalesItemKey
    • Calendar Day
      • This is the normal date column that will become the Actual Delivery Date later on.

The first table you start with, is the Orders table since this is the first table where  something happens first in a logistics related process. This is also the table where, we would assume, you can find the most SalesDocumentSalesItemKeys.


Where to Perform Transformations?

Within this blog, we will use PowerQuery for data-transformations. However, we could perfectly have done it using DAX expressions. Why do we prefer PowerQuery over DAX? As a general rule of thumb, we want to perform transformations as close to the source as possible. Hence, if the source system would’ve allowed data transformations (such as SQL) then we would’ve done that instead.


How to navigate to the Transform Data part of Power BI


Step 1

We started out by exploring our dataset and getting to know it. First we checked if there are duplicates in the SalesDocumentSalesItemKey column by using the Group By functionality in Power Query. This is also a good way to figure out whether your table is a historical table or if it already contains the most up-to-date record for every key.

After having figured out if there are any duplicates for your key column, don’t forget to delete this step. This is merely exploratory.

Step 2

Figure out which column causes duplication by filtering on one key (with duplicates of course) and searching for columns that show any differences in record values.
For us this was a record where stages of the process were being saved (i.e. for an order: processing – fulfillment – fulfilled).

Next you should go ahead and find the column that allows you to filter out the most up-to-date record for all of the instances of your key. In our example, the most recent record was the one with the latest “Created on” date.

Let’s call this kind of column the “MRC” (Most Recent Column) for future reference.

Step 3

Group By functionality was used to filter out the records with the most recent instance of the key based on the MRC. To do this you apply the Max operation on the MRC. Make sure to use the advanced section of the Group By to add the rest of the columns as well.

Your result will look something like the following:

For the column that’s generated which contains the rest of the table, click the arrows on the right-hand side of the column header. This opens the expand menu where you can select the columns you still need to include in the table you are creating.
Here you deselect all columns and search and select the column(s) you need for the calculation of the time aspect of your OTIF.

Now is also a good moment to rename the column(s) from its generated name to something more simple and meaningful.

Step 4

Now you can go ahead and delete the MRC column (here “Latest Record”) since you have no use for it anymore.

Step 5

Remove duplicates based on your key column (SalesDocumenSalesItemKey).



Step 1

Repeat step 1-3 for orders but instead of searching for the Requested Delivery Date in step 3, you search for the columns that contain the Actual Delivery Date.

We will keep referring to this column as “Actual Delivery Date”.
Don’t forget to rename the columns like you did in step 3.

Step 2

Correspondingly, you can now go ahead and remove the MRC column that you created with your Group By.

Step 3

You guessed it! It’s time to remove the duplicates again based on the SalesDocumentSalesItemKey.


This blog will continue in part 2, where we bring everything together in one table.

Tibo Vandermeersch

Tibo Vandermeersch

Tibo is a BI consultant with a mere focus on Power BI and data modelling. His interests mainly lie in DAX writing, development of Paginated Reports and Power Apps.