Connecting and Using MS Graph in Azure Data Factory

Connecting and Using MS Graph in Azure Data Factory

Introduction

Companies are creating more and more data, on which they want to gain insights. One valuable source of data is data from within the company itself, from the companies’ structure. For this type of data, the MS Graph API is something to look at.

The API provides a single endpoint to access all kinds of data from within your company. That could be from Office 365 services, such as MS Teams, Outlook/Exchange, … or Enterprise Mobility and Security services, such as Azure Active Directory (Users, groups, …) and Identity Manager, or data from Dynamics 365. This allows MS Graph to be used for many purposes. Once the API is implemented in a pipeline, only minor adjustments must be made if you want to use the API for another problem.

In this post, we will explain how to use the MS Graph API in Azure Data Factory. We will do this in several parts. First, we will explain how the API authentication is done. Then we will describe how to complete an App Registration and how to store its Client Secret in an Azure Key Vault. Finally, the different steps of building the pipeline in Data Factory will be explained in detail. Before we get to all this, a brief overview of a user-friendly tool to start exploring the API will be discussed.

MS Graph Explorer

Although the information about this API on the internet can seem overwhelming due to its many features, Microsoft provides a very nice tool to start playing with the API. MS Graph Explorer allows a user to get to know the API. Especially for users who have no experience with APIs, this tool can be very convenient. MS Graph Explorer offers a long list of sample queries, which the user can easily click on to get the desired data. After selecting the query, the request from the API will be displayed and a set of sample data for that particular query will be returned. MS Graph Explorer also allows you to log in with your personal Office 365 credentials, so the queries will return actual data of your organization.
If you want to check out this tool, you can go to https://developer.microsoft.com/en-us/graph/graph-explorer.

MS Graph Authentication

The authentication of the MS Graph API will be done with Client Credentials. For this, an App Registration is required. It should be noticed that it is best practice to do the authentication with a Managed Service Identity, since this makes it easier for an administrator to manage everything. When using Managed Service Identity, the required permissions need to be assigned. However, MS Graph currently does not allow to do this using Azure Portal, so this needs to be done with a PowerShell script. While this is not hard, there is decided to do the authentication with Client Credentials in this post, as it will allow users with no PowerShell experience to complete all the steps.

App Registration

Before we can use the API, we need something that will authorize the API to access the requested data. Azure App Registrations offers an easy way to do this.

  • In Azure Portal, go to ‘App Registrations’, and make a new registration.
  • When a new App Registration is created, open it and go to ‘API Permissions’ in the left ribbon.

 

  • Here we see an overview of the permissions of the App. By default, the App Registration has ‘User Read’ permission in MS Graph. This will not be sufficient, so new permissions need to be added. This is done by clicking on the ‘Add a Permission’ button.
  • In the window that opens, we click on ‘Microsoft Graph’, followed by ‘Application Permissions’. Now we get an overview of the possible permissions that can be granted. In our use-case we need a ‘Read All’ on the Groups and on the Users. Select this and click on ‘Add Permission’ below in the window.
  • An administrator needs to grant these permissions. Once this is done, the permissions should look like the ones below.
  • Before we can continue, we need two things. First, we need the ‘Application Client Id’. This can be found in the App Registration Overview. Copy this value in a temporary document, as it will be needed in a later step.

Second, we need the ‘Client Secret’. For this, we go to ‘Certificates & Secrets’ in the left ribbon, and click on ‘New client secret’. Give the Client Secret a name, set an expiration date and click on ‘Add’.

  • When the Client Secret is created, copy the Value. This value will only be visuable for a few minutes. After that time, it cannot be copied anymore, and a new Client Secret must be created

Key Vault

We do not want to place both the Application Client Id and the Client Secret hard coded in the Azure Data Factory pipeline. To make it more secure, we will store the Client Secret in an Azure Key Vault. It is possible to store the Application Client Id in the Key Vault as well, but that will not be addressed in this case.

 

  • Go to you Azure Key Vault. If you do not have a Key Vault yet, you can easily create a new one in Azure Portal.
  • In the Key Vault, go to ‘Secrets’ in the left ribbon, and click on ‘Generate/Import’. In the window that opens, fill in a Name and the Value. The Value is the Client Secret that we copied from the App Registration.
  • When the Secret is created, click on the Secret, followed by clicking on the current version. A window will open showing the ‘Secret Identifier’. Copy this temporarily in a document for later.

Azure Data Factory

Once the previous steps are done, we can start building the ADF Pipeline where the API Call will be done.
The pipeline we will build is structured as follows. It will start with a web activity where the App Registration Client Secret is retrieved from the Key Vault (GetAppRegSecret), followed by another web activity that will generate a Bearer Token (GetBearerToken). This Bearer Token is needed to authenticate the API in the next two copy activities. The first copy activity will call the MS Graph API for data of Azure AD Groups (GetADGroupsData), the second one will call the API for Azure AD Users data (GetADUsersData). The retrieved data will be moved to a Blob storage. After this, a lookup activity will collect the group data (LookupADGroups). Finally, for each group, a copy activity will transfer the unique ids of the members to the Blob storage (ForEachGroup). Each activity will be discussed more in detail below.

  • Open Azure Data Factory. Again, if the resource has not yet been created, this can easily be done in the Portal.

 

  • Create a new Web activity that will retrieve the App Registration Client Secret. In the ‘General’ tab, set the ‘Secure output’ to true. When the input or output of an activity is set to secure, it will not be logged for monitoring.
  • In the ‘Settings’ tab, set the URL to the secret identifier that was copied from the Key Vault. The Method is a ‘GET’. The Authentication should be set to ‘MSI’ and the Resource will be ‘https://vault.azure.net’.
  • Now we will add the Web activity that generates the Bearer token. For this activity, set ‘Secure Output’ and ‘Secure Input’ both to true. In the ‘Settings tab’, the URL should be ‘https://login.microsoftonline.com/<TenantID>/oauth2/v2.0/token’. Your TenantId can be found in Azure Portal. The Method is ‘POST’ and for the header we add one with the Name ‘Content-Type’ and Value ‘application/x-www-form-urlencoded’. In the Body we add dynamic content which can be seen below. In the body, the client_id is the Application Client Id of the App Registration. The scope is ‘https%3A%2F%2Fgraph.microsoft.com%2F.default’, the grant_type is ‘client_credentials’ and the client_secret is the output of the previously created activity.
  • Before we can add the Copy activities, we need to create a new dataset. In the left ribbon, go to Datasets and click on ‘New dataset’. Choose REST as data store.
  • For the Linked service, click on ‘New’.
  • Give the linked service a name, set the Base URL to ‘https://graph.microsoft.com’ and set Authentication type to ‘Anonymous’.
  • Select this Linked service for the new dataset, create a new dataset parameter and set the Relative URL to this parameter.
  • Now go back to the pipeline and add a Copy activity for the Azure AD Groups data. Set ‘Secure Input’ to true. In the Source tab, select the newly created dataset as ‘Source dataset’. For the RelativeUrl parameter, enter ‘v1.0/groups’. Add an Additional header with Name ‘Authorization’ and a Value with dynamic content, such that the value equals ‘Bearer <BearerToken>’. The BearerToken is the output of the previous web activity. Finally, add a Pagination rule with Name ‘AbsoluteUrl’ and Value ‘$[‘@odata.nextLink’]’. The MSGraph API allows a maximum of 999 objects per call. If there are more objects, a link will be added for a next call. ADF allows to solve this problem with Pagination rules. It should be noted that this Pagination rule can be nice to solve a pagination problem, but it is currently not very flexible to change. If the sink is a JSON file, this Pagination rule can cause problems. For this reason, the sink will be set to a parquet file, which will not give any problems with the pagination. For more info about the pagination, you can go to https://docs.microsoft.com/en-us/graph/paging
  • In the Mapping tab, select the columns you want to copy. It is important to check the box of ‘Collection reference’ for the Value.
  • Do the same for the Copy activity of data of the Users. But change the RelativeUrl field in the source to ‘v1.0/users’.
  • Now we add a Lookup activity. This activity will lookup all the groups with their group id. With this id, we can get for each group the members with the API request https://graph.microsoft.com/v1.0/groups/<group_id>/members
  • For the Lookup activity, fill in the required settings. The Source dataset is the sink of the previous copy activity. Set ‘First row only’ to false.
  • After the Lookup, add a ForEach activity to the pipeline. In the Settings, set the Items to the output value of the Lookup.
  • The final activity to add is a Copy activity in the ForEach loop. The settings of this activity are the same as the other two Copy activities, but for the RelativeUrl, we add dynamic content that we will use the Id of each group in the ForEach loop.
  • Now link the different activities like the image below.
  • The final step is executing the pipeline and start playing with the freshly retrieved data.

Conclusion

Although this blog only covered Azure AD data, you can easily access many more insights from your Office365 products with MS Graph. This is a big advantage of MS Graph, as it offers a single endpoint for multiple Microsoft services. All of this data can be easily obtained by changing a few parameters in the pipeline described above. This makes the MS Graph API an easy solution to multiple problems. A final point to note is that MS Graph can be used in multiple ways. Here it is used in Azure Data Factory, but the API can be called in a Python script as well.


If you have any questions about implementing this solution or how MS Graph can be used in your company, feel free to contact us.

About the Author: Jasper Puts

.

Jasper Puts is a Data Professional at Lytix. He can be found optimizing a customer’s analytics architecture and improving the accuracy of data science algorithms.

Microsoft Ignite – The official Lytix Recap

Microsoft Ignite, the most prominent technology innovation event of the year, hosted by Microsoft, is just around the corner. There has been a lot going on during the past week: this 2021 edition was full of surprises and had some amazing announcements in store. Who else than Lytix to keep you up-to-date with the most recently launched releases in the world of Microsoft?

Power Automate Desktop, Microsoft’s RPA (Robotic Process Automation) tool for desktop recording, records common or repetitive interactions with applications, either in a browser or in a native application, then can uses those recordings as the basis of an automated process. PAD will be made available to Windows 10 users at no additional cost. The ideal application to automate everyday tasks and free up time for higher-value work.

Microsoft just announced its new mixed-reality platform called Microsoft Mesh that allows people in different physical locations to join and collaborate on projects from anywhere and using any device including HoloLens, PC, tablets, and smartphones.  

To help people achieve balance, collaborate smartly, and find focus wherever and whenever they need it, Microsoft announced insights and experiences that are part of Microsoft Viva, the employee experience platform built on Microsoft 365 that empowers people and teams to be their best from wherever they work. Microsoft Viva brings together communications, knowledge, learning, resources, and insights into an employee experience that integrates seamlessly into the apps you use every day, such as Microsoft Teams.

For some people, 2020 felt like the year work and learning moved home. For others, work continued to happen on-site—with the added challenges of staying safe on the job and learning new ways to connect to remote colleagues. But the reality is: 2020 is the year work and learning moved to the cloud. To cope with this change of work environment, Microsoft released a ton of brand-new features for Microsoft Teams during Ignite 2021 as well which will ship later this year for Microsoft Teams. Microsoft PowerPoint Live integration, Microsoft Connect to easier interact cross-tenant and our favorite: easily organizing interactive webinars with up to 1.000 attendees, scalable to view-only broadcasting experience for up to 20.000 participants! Take a look at how customizable registration forms will be available:

Azure Data Factory is a fully managed, easy-to-use, serverless data integration solution to ingest your on-premises, hybrid, and multi-cloud data. You probably already knew this. But what about the product feedback on User Voice? Take a look at the 10 (!) new releases for ADF, fresh on!

The Power BI Premium Per User license lowers the threshold for SME’s with a strong analytical demand willingly to opt in on premium capabilities such as XMLA endpoint support, paginated reports, AI & deployment pipelines, without the necessity of paying at least $5.000 a month. PPU will become available at a price of €16.9 per user/month. Good to know: PPU will reach general availability on the 2nd of April, meaning you can test this license out at no cost whatsoever until then! If you’re interested in the what and how, make sure to check out one our earlier posted blog.

In the slipstream of Microsoft Ignite, a unique Ignite Challenge program has just been released. The Microsoft Ignite Cloud Skills Challenge 2021 exists in 7 different challenge, each containing several modules, helping you continue skilling up your technical expertise! Not convinced yet? Well, all participants who complete at least one challenge are eligible for a free certification exam, which can be redeemed to obtain the following certifications. This offer is due on April 1st, so make sure to grab this opportunity with both hands!

Wrap up

This article summarizes the most recent Microsoft Ignite, hosted virtually. Microsoft has a lot of features, to be released later this year for Microsoft Teams, Viva, Mesh and so on. Eager in how these new technologies could be beneficial for your company, feel free to get in touch with me or any of my colleagues at Lytix.

Power Apps

This day in age, apps are all around us. From our phones to our laptops, even in stores and restaurants, we can sometimes find apps. An app supports business processes in consulting balances, inputting records, grouping products, activity planning and making prognoses. However, to make those apps, someone with the right expertise needs to spend time making an app and implementing it in a system that does not always fully support it. All this demands so much time that there is no way to simply make small apps for processes like keeping inventory of company assets or inputting KPI’s. Unless you use Power Apps… From making an app which allows you to order sandwiches, to making one that makes accounting easier. You can do it all with Power Apps.

Would you like to possess the skill set to develop your own App with the Power Platform?
if you can imagine it, PowerApps can build it. Low-code & No-code to the rescue to enhance your organization’s digital agility. Feel free to subscribe on this online webinar by using the following link

What is Power Apps?

Power Apps is an App-building tool that is part of the Power-platform from Microsoft. PowerApps’ goal is assisting business users in creating their own applications and significantly reducing development time. You can make applications for desktop, tablet AND mobile depending on how you want to use them. It integrates well with O365 and is even able to be implemented in Power BI reports and vice versa. This makes it a standardized platform for which account management falls under O365. Power Apps allows anyone who has an O365 within the organization to build apps in a simple way, with minimal code, for the whole organization. In just one day it is possible to make a value-added application with requirement only specific to the organization. You can link your Power App with external data sources or even with Microsoft’s ‘Common Data Service (CDS)’. Using Power Automate on top of that, you can expand the possibilities with a huge factor.
Next to assisting in basic processes in your organization, there is also an AI-builder in Power Apps. With this feature, you can scan business cards, forms, objects and text in images. Most of these features are still in preview and require you to train an AI-model.

See how we use PowerApps at Cubis & Lytix!

Common Data Services (CDS) you mean?

Think of CDS as a centralized data source for your Power App as well as other tools like Sharepoint and Power BI. However, if a CDS is not the best choice for your organization it is always possible to use other data sources.

Power Automate

Power Automate allows us to build workflows under guidance without using code and/or cloud services and makes it possible to make workflows in a small or big context. It automates time-consuming tasks with built-in AI. It falls, just like Power Apps, under the collection of the Power Platform with the added advantages of course.

Our experience

Here at Lytix we keep up with the latest technologies, this includes Power Apps. Power Apps in itself is not that difficult to work with. That’s why we at Lytix can already deliver a Power App specific for your company within 1-2 days tops!
Having said that, it is mainly the fact that this uses a standardized platform to work on and the use of the principle of “low code, no code” that makes it possible to deploy perfectly well-functioning apps within 2 days. With the recent corona outbreak, it would be possible to make a quick app with standard questions and connect it to a database for reporting within a couple of hours.
Of course, it’s not expected that someone can do advanced apps completely at the start. That’s why we offer guidance, processes and best practices to our customers while developing new apps.

Customer Testimonials

In one out of three customers, we use PowerApps. The apps we implement all have different uses: from managing sales goals, managing the calendar for marketing or creating a master data environment (… and getting rid of manual excel files).
PowerApps allows our customers to get an easy overview of how well a certain store/department/branch is doing compared to the sales goals as well as allowing them to change already existing or fill in new goals. Another app makes it possible for our customers to easily connect promotions and customer segments to each other in the database. Next to that, our customer can also add new actions and customer segments straight into their database allowing immediate reporting on it as well in case the customer so desires.

In a nutshell

Power Apps benefits your organization in the following ways:

  • Quick and easy apps to complete the company’s business processes.
  • A familiar and standardized way to develop apps makes Power Apps more accessible to employees all over the organization.
  • Low code, no code.
  • Privileging and account usage happens within the O365 platform making it a central point for account management.
  • The Common Data Service (CDS) allows the dataset to be centralized and used by other apps that are part of the Power Platform.

If you want to know more about Power Apps and the Power Platform or hire us, you can contact us through the following channels.

Blog by Tibo Vandermeersch

This is how we keep our team spirit high!

As a data & analytics consultant, our goal is to spread our KNOWLEDGE on the projects of our clients. Next to applying and increasing our knowledge one of our core values is FUN.
A crisis should not stop this! It’s definitely more challenging for us to organize events in which we adhere to government policies. But we don’t mind a challenge, we accept it instead of ignoring it.


That’s why we chose to come online all together on Thursday evening. We started at 8 pm with an online quiz. We divided our team into a number of groups and competed with each other. Of course, we weren’t allowed to use our good friend Google. The quiz who consisted of 40 questions took about an hour to complete.  In my opinion, it was a success! However, wait for the results 😉

Afterward, we started some online games, such as Saboteur and some card games. We deliberately chose some games in which we could all participate. Paired with a local beer and a joke, it was also the perfect opportunity to catch up with each other.

The Covid19 virus is here and now, we can’t change that. What we can change is the way we deal with it. We are opting for the “all alone together” method, where we will be in contact with each other on a regular basis. Let’s prioritize health and let’s stay safe! Every life matters.

PS: Please don’t ask about our scores on the quiz, it will be better next week. Promised!😀

Blog by – Jorik Spiesschaert

Governed Corporate Power BI

Best wishes and many Data Insights. 

Oleon, my client, is at the forefront of the growing market of natural Chemistry. As lead of the Analytics team at Oleon it is my desire to provide useful easy-to-read business driven reports.

In a previous article I’ve been pondering on the question SAC vs Power BI ao visualization.

Now, 1,5 year later we are still confident in the combination of Power BI on a 7.5 BW Analytics System and a 2.0 Hana.  And during the holiday season, I have the luxury of being able to look back.

We are using Power BI in the cloud, easily accessible, automatic login with corporate laptops. End-users and Self Service users only need a browser. For handheld devices there is also an app should the user prefer it over a browser.

Learning Curve of Power BI

As with any new tool, there is a learning curve.  For end-users to self-service users this is close to instant. Almost all the questions I get from this side are functional, as it should be. Adding an information page or a link to definitions is a good help here.   

For Power users and developers the plus of Power BI is that ‘the first learning bump’ is easy to take and fills your head with grand dreams and a hunger for more. The second learning bump (programming with DAX and modelling for advanced visuals/calculations is much higher. 

Modelling

Important to know/learn is where to do which coding/modelling/manipulation and to get everybody in the team on the same page.  This takes time and there is no easy answer.

With BW on HANA you can model/code on the source system (usually SAP), in BW layers (the old LSA+, which with HANA you try to minimize to the essential and virtualize the rest), in HANA Calculated Views and finally in Power BI itself. (personal note: I prefer to link Power BI with HANA Views, I find it more stable/mature).

Classic SAP consultants will model too little in Power BI and classic Microsoft consultants vise versa, both making it far more complex then needed.  Both need to understand each other’s strengths for optimal results in the most efficient way.    Myself with a strong SAP background, I had to learn that modelling with master data, next to transaction data in Power BI has benefits in certain scenario’s (filter context, sumx, ..)

This approach requires to shift more logic from the reports (BEX Queries) to the HANA Views.  Only Aggregation depended on logic and logic for visualization is still needed in your reporting layer. Plus side is that this approach makes it easier to switch to a new reporting tool(s) in the (far?) future.

Source systems

Power BI will let you happily combine data coming from all sorts of source systems on premise or not also production and development, so you will need clear guidelines to prevent blunders (reporting dev data on a production system). We of course use ‘import’ method, since we do not use an SQL layer in Azure or other.

A follow up on this is flat files.. Classic SAP approach is to upload your files in an aDSO, transport this set-up to production, and publish this to reporting.. (2 to 4hours to set up) Advantage is that this process is more IT-governed with more fail-safe in case of data entry errors. The alternative is to place flat files onto OneDrive (easier to set up) or a network drive and refresh the data. This set-up is way faster, and typically takes about 5minutes. Refresh of data can also be automated via the Gateway. This second approach can be applied for files that are and will be only relevant for this one report and are maintained by IT.

Visuals

In your first hours of using Power BI you will quickly dive into the marketplace for EXTRA fancy visuals.. only to find out later that the better visuals in there cost money. Much later you discover that these marketplace visuals have caveats.. they lack e.g. language translations, behave different for tooltips, don’t work the first days after a new version .. or other.  In a corporate environment, you will mainly want to stick to the visuals provided by Microsoft.

Governance

Governance is absolutely essential, the playful joy of Power BI almost makes you forget this potentially catapulting your company back to 1980. Governance is a chapter on its own. My wonderful colleague Sander from Lytix wrote a short take on this.

His blog can be read on this link: https://lytix.be/governed-self-service-a-power-bi-rollout/

Final thought

Power BI has a low threshold and still has the option for complex constructs if needed with an excellent distribution platform in the cloud. There is a large user base and a lot of information, youtube videos and many more. Visualisation is the tip of the Analytics iceberg, but it is the part that users see and interact with. This layer has to be and is easy to read/pinpoint opportunities and trouble-free/time saving to use. Inviting users to look more at their data.

Blog by Carl Goossenaerts