Monitoring Power BI usage metrics using Azure Data Factory
When you have Power BI integrated into your company, it is very important that the tenant of the Power BI service is managed well. But how do you do this? How do you measure which reports are consumed the most, which datasets are used the most, which employees are the most active on Power BI? In this blog we will dive a bit deeper in the possibilities you have for analyzing the usage statistics of Power BI. We will explain in detail how to set up the method that offers a lot of flexibility.
There are 3 options. The first one is in the admin portal; hence you need to be Power BI admin to access this report. In the tab Usage metrics, you can find a report with some global statistics. The report is however not interactive, and it only focuses on the number of reports, dashboard and datasets instead of “who is producing a lot of reports”.
The second option is found in the workspace. When you click on the 3 dots next to a report, you have the option “View usage metrics report”. This report is interactive and can be edited. It displays a nice view of the usage statistics across a workspace. However, when you have more than a handful of workspaces, this is not scalable. Hence, this does not allow for a global overview where you can compare what happens in different workspaces.
The third option is to extract the data using the Power BI REST API with Azure Data Factory. This means you can process the data yourself. Hence, the underlying data is more flexible, and you can create your own reports on this data.
How to set this up in Azure Data Factory
There are some steps in Azure you must through to set this up:
- Create a new Security Group in Azure Active Directory
- Add the managed identity of the Data Factory resource as a member of the security group
- Add the security group to the Admin API tenant setting In the Power BI admin portal
After doing this, Azure Data Factory is allowed to access the Power BI REST API.
Like always, the first thing to do in Azure Data Factory is to create a Linked Service. The Linked Service will be of type REST and needs the following properties:
- Base URL: https://api.powerbi.com/v1.0/myorg/admin/
- Authentication type: Managed Identity
- AAD resource: https://analysis.windows.net/powerbi/api
Now, we have to decide what kind of data we want to import, i.e. which calls we have to make to the API. The following link from Microsoft provides detailed overview of all the available calls and which keywords to use: https://docs.microsoft.com/en-us/rest/api/power-bi/admin.
We will start simple: we want a list of all the reports, all the datasets and all the workspaces. This is done using the following URLs:
Note that the REST API calls workspaces groups. In the last API call you have to specify a limitation parameter of the maximum amount workspaces you want to import. This implies that when you have more than 5000 workspaces, you will not be able to import all of them.
In Azure Data Factory you can construct a new activity for every call, or you can loop over them. Looping is more elegant; we will explain here how this is done. To loop over the calls, you have to create a parameter of type Array. We will call this parameter calls. In our case, the parameter will have the value: [“groups?$top=5000″,”reports”,”datasets”].
We already made a linked service in Azure Data Factory, this is the connection to the API. But we also need to make a dataset that represents this linked service. This is done by clicking on “new dataset” and choosing type REST again. Next to the base URL, there is also the relative URL. The relative URL represents the string that comes after the base URL. This is the part we want to loop over; we want every value of calls to appear at the end of the entire URL. To achieve looping over the different calls you must create a dataset parameter. Let us say we call the parameter relativeurl. Proceed by setting the value Relative URL equal to the dataset parameter, like you can see in the image:
Next up is to create a ForEach activity and insert the calls parameter in the settings as Items. The activity in the ForEach loop will be the copy activity. Choose your dataset and you will see that Azure Data Factory asks for the value of parameter relativeurl. This is where you can put in @item() The request method should be GET, because only want to import data from the API. We do not want to create or update something in the Power BI tenant.
For the Pagination rules you must fill in the following:
Generally, when you try to retrieve a list of objects that is too large using an API, the API wants know how to access the next “page” or results. These settings will make sure the REST API extracts all the data right way.
For the sink of the copy activity, you can really choose anything you like. You can choose an Azure SQL database if you prefer SQL to analyze data or you can choose the Azure Data Lake. Remark that the imported files are in json format. If you prefer the Data Lake then you can connect directly with Power BI on the json files. The transform tab of the Power Query Editor gives you the option parse the json file into tabular format. However, using this method has a downside: you will not be able to build your own history of the data. The call https://api.powerbi.com/v1.0/myorg/admin/reports will return a list of all the reports in your company. If someone constructs a new report tomorrow and then does the same API call, the list will contain one more report. Hence, the API always returns the current “state”. This implies you do not know when a certain report was created.
A better and more consistent option would be to process the data using Azure Databricks. In Databricks you can also convert the json file to a table and then you can add a date column. This way you will know when a report is created, and you will gain insights into the timeline of report creation. The possibilities of data processing are endless with Databricks, and you can knead the data until it is perfectly to your liking.
It is possible to create your own custom usage metrics report based on all the workspaces in the tenant. This can be done with Azure Data Factory and the Power BI REST API. After that you can use Azure Databricks to process the data and construct your own history. Monitoring these usage statistics will give a better understanding of how Power BI is used inside your company.