Power BI Aggregations

Have you ever worked with a large Power BI report that’s just very slow to load? You’ve tried tweaking the model to improve the performance but without any luck. Then maybe aggregations is something to look into.

Using aggregations, you can store the values of a (large) table aggregated by, for example, a date, a product category, a store location,… This way we can visualize these aggregated values on our report. However, the need to enable deeper insight on these aggregated numbers can soon arise. Therefore, we can also go in detail on the data after filtering, without the user noticing anything different. In this blog, we’ll provide you all you need to know to get started with aggregations in Power BI, some pitfalls and some very interesting, newly released, preview features!

Why aggregations?

Let’s start with a simple example, you have a sales table with a couple of million rows connected to a date table in your Power BI model. All of the high-level visuals in your report show information of these sales by different date fields, but of course, you want the user to have the possibility to also drill through the data to see the sales on a more granular level. Currently all these datasets are in Direct Query mode.

You experience the report is getting slower as there are more and more rows added to the sales table. Here, aggregations come into play. Look at it this way: In the ideal scenario you want as much queries to be executed against the smaller aggregated dataset. Only a few times when users really need to drill deeper into the data, we use the much larger detailed dataset.

How to create an aggregated table

You can add a new sales table, aggregated by day. You can group the data already in the source or use the built in ‘Group By’ function in Power Query. In the example below, we aggregated the table by ‘Order Date’ using Power Query.

Connect the newly created aggregated table (e.g. ‘agg_Sales’) to the table(s) with the grouping column(s) being the key. Because in our example we grouped by a date key, I will connect the aggregated table to the Date table.

Manage aggregations

Now, we have a table that is aggregated in our model, but to use all the advantages that Power BI offers with aggregated tables we need to specify the different aggregations of the table. Right-click on the 3 dots of the aggregated table and select ‘Manage aggregations’.

On this window you can select your aggregations:

AGGREGATION COLUMN : the aggregated column from your aggregated table

SUMMARIZATION: the type of summarization specified

DETAIL TABLE: your ‘original’ table, with more details about the aggregated table

DETAIL COLUMN: the corresponding column from your detail table

Don’t forget to specify the ‘GroupBy’ column:

NOTE: when the detail columns are grayed out for you, make sure that the data types between the aggregated table and the detail table match!

When you’ve specified all the aggregations and the corresponding columns, you can click ‘Apply All’.

Did you notice that after creating the aggregated table, the table is automatically hidden? That’s because we won’t use the aggregated table to drag and drop the values to our report, but we will use the columns (or measures created from) the Sales table. Based on whether or not a report consumer requests detailed data, which is not contained within your aggregated table, Power BI will know when it can use the values from the aggregated table and when it needs more detailed information from your regular ‘Sales’ table.

You may think “yes, that’s it!” Well, almost. During the creation of our tables we have chosen Direct Query mode. You can also see this by the thick blue line on top of each table in the new model view in Power BI.

If you think about it, direct query isn’t that interesting for the aggregated table in our example. We want the aggregated data already stored in our model so that we don’t lose time querying the data source. So we’ll need to change the storage mode for the aggregated table from ‘Direct Query’ to ‘Import’. When you do this, you’ll get a pop-up window as shown in the image on the left.

Because our aggregated table is linked to the Date table and the Date table is in Direct Query mode we need to also change the Date table to ‘Dual’ mode. You can see ‘Dual’ mode as a bridge between a ‘Import’ and a ‘Direct Query’ table, in this case the Date table will behave both as an import as a direct query table.

Now you can see that the aggregated table is in import mode as it has no blue line above it, Date is in dual mode as you can see by the dashed blue line.

Are we hitting the aggregated table?

I’ve created 2 visuals in the report: The sales amount by month name (from the date table) and the sales amount by gender (from the customer table).

We expect (and hope) that the first visual will use our aggregated table, because the table is grouped by order date, which is linked to the date table. The second visual should use the not-aggregated sales table, because we have no customer related grouping or relations specified.

One way to determine if visuals are using the aggregated table or not is by using the Performance Analyzer. You can open it by clicking in the menu: ‘View’ > ‘Performance Analyzer’

You can start recording the Performance Analyzer by clicking ‘Start recording’, then click ‘Refresh visuals’, this will reload all the visuals on your report. To make sure no queries are cached, you can use this little trick: Create a blank page, open the blank page and close the Power BI file. Then, re-open the Power BI file. After that is done, the Performance Analyzer will show you the different steps it had to execute to show the visual.

The first visual, the one that should use the aggregated table is ‘SalesAmount by Month’. Here you can see it executed the steps:

  • DAX query (15ms)
  • Visual display (22ms)
  • Other (92ms)

The second visual, the one that should not use the aggregated table is ‘SalesAmount by Gender’. With following steps:

  • DAX Query (1865ms)
  • Direct Query (1860ms)
  • Visual Display (13ms)
  • Other (66ms)

Only the second visual uses a direct query to retrieve data, so that means that our first visual successfully used the imported aggregated table. You can dive deeper into the executed query if you want by clicking ‘Copy Query’.

You can also use DAX Studio to determine if your queries are hitting the aggregated table. Connect to your model and click ‘All queries’ in the ribbon.

You can also clear the cache in DAX studio to make sure nothing is cached. Then you can refresh the visuals in Power BI Performance Analyzer as we’ve done in the previous steps.

In the results pane you can see the executed queries, being the first one the ‘gender’ visual and the second one the sales amount by date. Notice dots before each query, a white dot means, no aggregations are used and the black dot states (as you can see by hovering over it) that we have a hit to our aggregation table.

By enabling server timings and running the query in DAX studio, you can also see why there was no aggregation hit.

Automatic aggregations

If you are lucky enough to work with a premium workspace, you now also have the possibility to use automatic aggregations. This (preview) feature in the Power BI service removes all the hassle of setting up and maintaining your aggregation table. Automatic aggregations use AI on the query logs from your reports to determine which queries are useful to store in a aggregated table.

In the Power BI service you can select ‘Automatic aggregations’ on your dataset, because aggregated tables are stored in import mode you also need to specify the refresh frequency.

Once set up, you have control of the percentage of cached queries you want to store in your workspace. You can play around with the percentage of queries to cache and view the query performance impact and the amount of aggregations stored.

Conclusion

Setting up aggregations can take some time, but imagine the combined time report users will win when they don’t have to wait for slow queries every time they use a report. Especially when you have premium capacity, there are even less reasons to not use aggregations.

Still struggling with your aggregations? Or do you want to know more? Feel free to contact us!

Jef Van den Dungen

Jef Van den Dungen uses wizardry skills in his day-to-data work. He is now active as an experienced consultant, guiding large organizations in rollouts of Power BI. Jef is often consulted for best-practices and performance optimizations.