10 Performance tips to speed up your Power BI reports

Arriving at work and the first thing you do is opening your Power BI report and going for a coffee while waiting for the report to render. Seems familiar?

Well, it shouldn’t. Your reports should render instantly, especially if there is a large or C-level audience consulting them on a regular basis. Luckily enough there are some tips and tricks to melt your frustrations like snow in the sun.

1. Only keep necessary visuals

Every single additional visual needs time to render and will therefore slow down your report. And if, in some cases such as when relying upon bookmarks, you really see added value in enriching your report with a bunch of shapes or data visualizations, please do hide them whenever possible. As hidden visuals won’t be rendered, they do not have a single impact upon report performance.

2. Use integers instead of text values whenever possible

String values do not compress as well as integer values. By replacing string values with integer values, preferably at source level, both .pbix file size as total memory size will reduce exponentially. Examples: Adjust data types of numeric codes to integer values where possible and use 1/0 instead of TRUE/FALSE.

3. Stick to Power BI’s default visuals

Report developers are often very tempted to scroll through the list of third-party visuals, which can be found in the marketplace, and end up importing them. In some specific business scenarios, they can actually be very suitable. However, please do use them with care. The biggest performance wins I’ve ever made was accomplished by simply converting third-party visuals to default visuals.

4. Avoid both-directional relationships in your data model

The number one performance killer. A both-directional relationship between two tables actually means both tables will cross-filter each other. This has a serious drawback on report performance as every interaction is very plausible to cause a loop of filtering actions, especially when your data model consists of multiple enabled both-directional relationships. Optimal data model design is key. Alternatively, DAX’ CROSSFILTER() function can come in handy as well.

5. Understand the difference between duplicate and reference queries

Duplicate queries will simply serve as a one time copy of the M query code whereas referencing a query also pushes through any changes performed on a referenced query to the queries referring to this query. Does this impact report performance? No, but it goes without saying the latter improves query maintainability as you only need to adjust the referenced query.

6. Only load data which will be reported upon

What about the options to “enable load” and “include in report refresh”?

Many of you have probably came across these options when developing reports but don’t exactly know how to correctly interpret their function. When the enable load checkmark is unchecked, data won’t be loaded into memory. The respective query will still be available in the query tab but will not be cluttering your data modeling or report view. This is often used for intermediary queries: intermediate results you want to refer to or merge/append other queries with. There is a clear distinction with the “include in report refresh” option: when unchecking this checkmark, data simply won’t be refreshed when you click the refresh button of your report. These query fields will still be visible in your data modeling and report view if you have the “enable load” option checkmarked. This feature is perfect for preventing static data from redundantly refreshing. Additionally, incremental refresh can be configured as well for example to prevent frozen historic data from refreshing unnecessarily. This feature used to be premium exclusive but it is generally available as of the end of February 2020 for Pro users as well, so make sure to check it out!


7. Slicers can be real performance killers

These are quick wins: change your slicers to dropdown instead of list. When a user opens a report, all (non-hidden) visuals require render time: the time Power BI needs to actually load & show a data visualization. Slicer lists will last longer to render in comparison with slicer dropdowns as the former presents filter values on report initialization. The latter will only load filter values when a user triggers the dropdown.


Even more important, is the slicer visual selection. As stated before, you should try to stick to default visuals if possible instead of relying upon third-party visuals from the marketplace. However, there are some value adding visualizations in there as well. To provide you with a clear example visual significantly contributing to user experience, the market place’s portfolio includes the so-called ‘HierarchySlicer’. This visual -unsurprisingly- enables users to add multiple fields to one and the same slicer, creating a hierarchy. At the time of report creation, Power BI did not offer the possibility to add multiple fields to their default slicer.. YET. As of Power BI’s February 2020 update, the default slicer now also supports hierarchies. In fact, one of the biggest performance wins I came across existed in simply changing a the third-party ‘HierarchySlicer’ to the Power BI default variant reducing report render time with an astonishing 90%. Continue reading to learn how you can measure performance.

 

8. Overthink your data modeling technique

I’ve had fantastic experiences reporting on a dimensional model. Bottom line, creating report look and feel should only take up a small proportion of your total efforts spent. Designing the right data model will mainly decide whether your reports will behave in a performant way. I am aware there are several ways to design a data model but I can only strongly advise to stick to Kimball’s approach. Other best practices you should keep into account exist in:

> The use of surrogate keys

> The use of role-playing dimensions

> The determination of fact(s), possibly enabling parallel loads

> No snow-flaking

Please do mind I only scratched the surface of dimensional modeling here. Always try to question your decisions in your journey to the ideal data model. Critical thinking is key. In most cases, an additional pair of eyes won’t harm.

9. Split granularity to lower cardinality

To perform calculations one or more requests are sent to the storage engine where the data is stored. This could be either an external relational engine (DirectQuery) or Power BI’s storage engine, called the Vertipaq engine (Import). Due to the variety of DirectQuery connections, I’ll be focusing on the latter. In contrast to a traditional SQL database, which evaluates data row by row, the Vertipaq engine saves data into memory in a columnar format.

With this small introduction to the Vertipaq engine, I want to draw your attention to how Power BI handles its imported data: the higher the cardinality of a column, the longer Power BI will need to perform its calculations upon this data. Put in other words: try to limit the number of distinct values of an attribute.

For example, you could split datetime fields in a date and a time field. Consequently, memory load will be reduced to a minimum as both fields will have a much lower cardinality in comparison with the original datetime field. This is something very case specific but I hope you were able to grasp the idea.

10. Measure performance !

Get a clear view on how a report is currently performing so you can quantitively measure and compare the outcome of any of your taken actions. Power BI has a built-in feature, called the performance analyzer, enabling users to measure how much time is spent on rendering visuals and how much time is spent on executing DAX code. It even gives you the full extract of the DAX code, which you can further analyze in tools as DAX Studio.

Remember the difference between Power BI’s updated default slicer and the HierarchySlicer, imported from the market place? I’ve compared their performance on one of my reports resulting in the following excerpts:

Wrap-Up

The above tips and tricks are some of the actions you could opt for while tuning performance of Power BI reports. I did not have the intention to assemble a complete list. There are several more options but I can only advise to start with the ones I’ve described above. If none of them result in the performance you were hoping for, do not hesitate to get in touch with me or any of my colleagues at Lytix.

All aboard the data train today, CHOO CHOO 🚂 

Lou Segers
Microsoft Analytics Consultant @ Lytics
Follow me on LinkedIn

Lytix’ XTL Framework

  You’ll often hear a Lytix consultant say that he/she is proud of our XTL Framework. What is it exactly? And how can it improve your business?  

Why do I need a Data Warehouse (DWH) ? 

Data Warehouses support organisations in their decision making by providing data in a well-governed way, this includes:

– Integration of source systems: organisations prefer a mix of solutions to achieve a competitive advantage. Often, large ERP systems (Dynamics AX for Finance and Operations, SAP, Odoo, …) are combined with niche products (SalesForce, Google Analytics, Silverfin, Exact Online, … ) to support their business processes. In an analytical setting, this data also needs to be enriched with open-source APIs (weather APIs, ID verification, … ).

Single source of truth: valuable time can be wasted if numbers cannot be trusted. A DWH provides you one validated catalogue of data.

Easy Managed Access: Data is provided in a way that business users can crunch large quantities of data. This, while still applying security; users should be able to see only what they’re entitled to see (no more, no less).

A foundation for Data Science / AI: Provide a well-structured data repository for your data scientists and minimize their valuable time of cleaning-up and combining sources.

Why do I need the Lytix’ XTL Framework?

Lack of internal engineering experience can result in an unsustainable solution on the long-term. It gets even worse if this person leaves the organisation. Our framework provides unified dimensional models and minimizes ETL engineering. 

 

Advantages of the XTL Framework:

– Unified way of Working: In each of our supported technologies (Azure SQL DB, SnowFlake, DataBricks and Azure Synapse), the main task of engineers is to convert business logic to SQL, a query language that is widely known and which can be relocated to other technologies.

– Maximized logging (with minimal effort during development): Logging modules are automatically wrapped around ETL statements, which reduces a time-intensive burden of developers. Within these modules, all events are captured: number of reads, inserts, updates and deletes, as well as warnings and errors.

– Built for the cloud: The framework is built in such a way it take advantage of cloud benefits; pay-as-you-use, scale-up and scale-down easily.

– Applied Kimball: The modelling techniques are based on best-practices defined by the renowned data-modeller Ralph Kimball (The Data Warehouse Toolkit, Ralph Kimbal & Margy Ross). The most-used and popular Slowly Changing Dimension (SCD) types are supported, such as ‘SCD0: Retain the Original’, ‘SCD1: Overwrite with newer values’, ‘SCD2: Track history by adding new rows’, ‘SCD3: Add new attribute’, ‘SCD6: Add type 1 attributes to a type 2 dimension’ and ‘SCD7: Dual Type 1 and Type 2 Dimensions’. Switching an attribute from SCD Type 1 to SCD Type 2 (… and thus tracking history) is just a matter of a click.

Additional Features

Automated Documentations

The XTL framework is accompanied by a comprehensive report providing you with insights on the ETL processes. This report both provide you standard metric evolutions (number of updates, deletes, failures, …) as well as a dependency view which maps your sources to the targets and shows you the applied transformations.

Dependency Tree

Our framework allows to automatically generate dependency trees based on the metadata of your transformations. Generating the most optimal parallel execution steps within your whole ETL becomes child play. Do you need to refresh only one, two or three tables because you need updated data fast (e.g. for month-closing)? No problem, the dependency tree will only trigger those ETL steps which are directly related to the tables you want updated.  

No ‘All Or Nothing Approach’ in Batches

We often hear following complaint from customers not using our framework “I know one little package failed, but does that mean that the whole dataflow to the business should fail?!

Our framework adresses this issue! The dependency tree allows you to decide what to do upon failures, will you just ignore the failure and continue with your ETL? Or do you stop abruptly to make 100% sure the error doesn’t cause any future problems? This is fully configurable in our framework and it allows you to make a decision per ETL step. 

Data Quality Reporting

Our framework includes a ‘Data Quality’ module. By inputting business rules, the framework will look for inconsistencies and malicious input and report the findings back to the business.

 

Components

Following components are used throughout the framework:

– Azure Data Factory is used as orchestrator and scheduler. This component kicks-off at the requested time (at midnight, once per hour, …) and makes sure all dependencies are initiated in logical order.


– A ‘Storage and Compute Module’ to make the necessary transformations and to provide the data to data consumers; being it business users, data scientists or even mobile devices and webportals. One of four technologies can be implemented as a back-end for this component:
    o Azure SQL Database: A Microsoft product for customers who are already familiar with SQL Servers and in which the Data Warehouse needs lots of integrations.
    o SnowFlake provides both a data lake and strong compute for your analytical needs.
    o DataBricks: if streaming and distributed computing are necessary to build an all-comprehensive Data Warehousing solution. This platform integrates best with organisations that have a large data science team!
   o Azure Synapse (former Azure SQL Data Warehouse) providing large limitless compute for customers with more than 4 TBs of data.


An analytics engine which allows business users to create their own reports, being it Azure Analysis Services or Power BI Pro/Premium. 

Wrap-Up

This article describes one of our proven ways to fuel your data driven organization, covering a multitude of use-cases ranging from traditional reports to data science and this in a governed way.

In need of some help? Do not hesitate to contact Lytix or the author Sander Allert and we/he will be glad to exchange ideas on this subject!