Composite Models

Imagine having your Data Model ready and everyone loves it.
Then you get this question all of a sudden.

“I have made this excel file and I would like to use and combine it with the Sales Dataset in PowerBI.”

The next thing you do is analyze the request and for some reason, maybe you don’t want to flood the Datawarehouse with Excel Files or maybe your ETL team is too busy with requests, but you have to say no.

The business user is quite disappointed because it would really have solved that time-consuming task he/she had to do every week.

What if I told you that all the above can be avoided with a very simple solution?

Composite Models are the answer!

These models are a way to seamlessly combining different data sources in PowerBI.

With composite models you can connect to your Azure Analysis Service (AAS) model or Power BI dataset and enrich them even further by connecting to a second model, adding flat files, or even just the ability to build measures on top of your AAS model.

This is all super exciting for both business users and BI developers as it will create even more analytical opportunities.

Now, I know what you are thinking.
Let’s take a look at this feature!

I love your eagerness but before we dive into it, you should know that this feature has the following Limitations.

The following Live Connect multi-dimensional sources can’t be used with composite models:

  • SAP HANA
  • SAP Business Warehouse
  • SQL Server Analysis Services
  • Power BI datasets
  • Azure Analysis Services

 

As of December 2020, DirectQuery for PBI and Analysis Services is available in Public Preview.
If you would like to use Composite models with PBI Datasets and AAS you should enable this feature.

Now, with that out of the way, let’s get to the exciting part!

In our Example, we will be using a live connection to a PBI Dataset and an Imported Excel File.

First of all we will be connecting to the PBI Dataset we want to use.
Once this is done and you are all ready and eager to add a different source to the model, or to just add your own calculated columns, you will need to go to Transform Data.

You will be prompted with the above information.

As stated previously a DirectQuery Connection is required. This is the reason the DirectQuery for PBI Datasets and Analysis Services was required.

It will also notify you that this change is permanent. You cannot change this back once performed and saved. This is very important to keep in mind.

When we are ready to continue we can click on the Add to Local Model button and a DirectQuery Connection will be made.

You know it will be successful when you see the following text in the bottom right corner.

Now we can do quite a few more things more we couldn’t before to the already existing model.

For example, if you previously right clicked on a table you got the below options.

However, now we can actually create new Columns along with renaming/hiding Tables, Measures and Attributes and create relationships.

This last one will be very handy when adding different sources to your model.

Now we can actually add a different source file to our model. This can go from Excel or Sharepoint files to different PBI Datasets.

Once you add another data source you will once again be prompted with a message.

Of course, when combining different data sources through DirectQuery there is always the possibility that data can be included in queries sent from one data source to the other.

Always be certain of the sources you are pulling your data from.
Once you are sure this is no issue for you, we can continue.

Both models will now exist separately from one another and can be joined through a relationship you define if required.

Different visuals can now be built using data from both (Or more) sources.

All of this is amazing and a great leap forward for both developers and advanced business users.

PowerBI is a tool that keeps improving and each iteration brings in more capabilities to deliver high-quality dashboards to the end-user in a short amount of time.

If you are interested in getting to know more about Composite Models or other Microsoft BI-related aspects you can always contact Lytix.

 

 

Power BI and Spotify – A match made in heaven?

Data isn’t life. They say music is. Well, is it? Wouldn’t it be awesome to simply combine both worlds? Let’s find out if this could turn out to be a perfect marriage.

Today, with approximately 286 million users to date, including 130 million subscribers,
Spotify can be considered as one the world’s most popular music streaming services.
In a remarkable coincidence, I happen to consume their service every once in a while myself. Actually, this is an understatement as I stream music basically on a daily basis:
During travel, development, while exercising or even when writing this blog at home.

What would we like to find out?

In regards to my streaming behavior, I came up with the following questions:

> What artists did I favor the most

> What songs did I listen to the most ?

Can certain events, such as festivals, cause a change in my streaming behavior?

> When did I stream music?

What KPI’s can we define to provide us the answers with?

Of course we need to quantify these questions based on a KPI. I believe there are two ways of measuring whether I like or dislike a specific song. The first one is the time I actually listened to the song, which I will be referring to as the listening time (in minutes). The second KPI is the song count, being the number of times a song started, regardless of me manually pressing the play button or Spotify’s autoplay starting the song.

Where can we gather the data we need?

Spotify offers their users to retrieve data through their Spotify API. Recently, they launched a refined version of this API which, at the moment of writing, is still in BETA. I can only praise Spotify for how well they document their API. They offer a clear process description on how to connect to their API, written down call definitions and even providing developers a test environment console. More information can be found here.

Evaluation time!

As we now have all data in place, we can assemble our very own Spotify dashboards with Power BI. Let’s analyze my streaming behavior of the past year by trying to formulate an answer to our predefined questions. 😊

What artists did I favor the most?

Post Malone seems to be the clear winner based on my streaming behaviour, there’s no doubt there. I tend to have a preference for R&B with Tyga and Blackbear claiming a spot in my top 5 artists.

What is interesting here is that my top 5 artists slightly differs when switching between our two predefined KPI’s. Spot 1-3 are identical, regardless of the KPI we base ourselves upon. However, in terms of song count, Nickelback and Loud Luxury are my respective number 4 and 5 whereas, in terms of play time, Loud Luxury and Blackbear take in those positions. This can have two possible explanations:

1) Songs of Nickelback are simply shorter than Blackbear songs

or

2) I’m excited to hear Nickelback songs’ refrains and get bored afterwards, forcing myself to switch songs once the refrain passed by.

What songs did I listen to the most?

It wouldn’t surprise you I listened more than 150 times to Post Malone’s top songs such as A Thousand Bad Times, Sugar Wraith and Goodbyes.

Can certain events, such as festivals, cause a change in my streaming behavior?

Yes! Last August I attended the Sziget festival in Hungary. According to my streaming history, I enjoyed Post Malone’s performance so much that the following months, Post Malone skyrocketed and became my number one streaming artist.

When did I stream music?

It turns out I’m actually getting value out of my monthly subscription. The past year, I streamed 357 out of 365 days. My listening time stays more or less steady throughout the week. During the work week, I mainly stream music traveling between home and work. And if you’re interested; yes, I’m not an early bird. 😊

Wrap-Up

This article described one way of vizualising your Spotify streaming behavior with Power BI. Of course there are many more insights these reports can be enriched with. I tried to provide a high level summary of how I created this analysis. If you’re interested in the technical details or would like to get help upon solving your own data visualisation enquiries, do not hesitate to get in touch with me or any of my colleagues at Lytix.

All aboard the data train today, CHOO CHOO 🚂 

Lou Segers
Microsoft Analytics Consultant @ Lytics
Follow me on LinkedIn

10 Performance tips to speed up your Power BI reports

Arriving at work and the first thing you do is opening your Power BI report and going for a coffee while waiting for the report to render. Seems familiar?

Well, it shouldn’t. Your reports should render instantly, especially if there is a large or C-level audience consulting them on a regular basis. Luckily enough there are some tips and tricks to melt your frustrations like snow in the sun.

1. Only keep necessary visuals

Every single additional visual needs time to render and will therefore slow down your report. And if, in some cases such as when relying upon bookmarks, you really see added value in enriching your report with a bunch of shapes or data visualizations, please do hide them whenever possible. As hidden visuals won’t be rendered, they do not have a single impact upon report performance.

2. Use integers instead of text values whenever possible

String values do not compress as well as integer values. By replacing string values with integer values, preferably at source level, both .pbix file size as total memory size will reduce exponentially. Examples: Adjust data types of numeric codes to integer values where possible and use 1/0 instead of TRUE/FALSE.

3. Stick to Power BI’s default visuals

Report developers are often very tempted to scroll through the list of third-party visuals, which can be found in the marketplace, and end up importing them. In some specific business scenarios, they can actually be very suitable. However, please do use them with care. The biggest performance wins I’ve ever made was accomplished by simply converting third-party visuals to default visuals.

4. Avoid both-directional relationships in your data model

The number one performance killer. A both-directional relationship between two tables actually means both tables will cross-filter each other. This has a serious drawback on report performance as every interaction is very plausible to cause a loop of filtering actions, especially when your data model consists of multiple enabled both-directional relationships. Optimal data model design is key. Alternatively, DAX’ CROSSFILTER() function can come in handy as well.

5. Understand the difference between duplicate and reference queries

Duplicate queries will simply serve as a one time copy of the M query code whereas referencing a query also pushes through any changes performed on a referenced query to the queries referring to this query. Does this impact report performance? No, but it goes without saying the latter improves query maintainability as you only need to adjust the referenced query.

6. Only load data which will be reported upon

What about the options to “enable load” and “include in report refresh”?

Many of you have probably came across these options when developing reports but don’t exactly know how to correctly interpret their function. When the enable load checkmark is unchecked, data won’t be loaded into memory. The respective query will still be available in the query tab but will not be cluttering your data modeling or report view. This is often used for intermediary queries: intermediate results you want to refer to or merge/append other queries with. There is a clear distinction with the “include in report refresh” option: when unchecking this checkmark, data simply won’t be refreshed when you click the refresh button of your report. These query fields will still be visible in your data modeling and report view if you have the “enable load” option checkmarked. This feature is perfect for preventing static data from redundantly refreshing. Additionally, incremental refresh can be configured as well for example to prevent frozen historic data from refreshing unnecessarily. This feature used to be premium exclusive but it is generally available as of the end of February 2020 for Pro users as well, so make sure to check it out!


7. Slicers can be real performance killers

These are quick wins: change your slicers to dropdown instead of list. When a user opens a report, all (non-hidden) visuals require render time: the time Power BI needs to actually load & show a data visualization. Slicer lists will last longer to render in comparison with slicer dropdowns as the former presents filter values on report initialization. The latter will only load filter values when a user triggers the dropdown.


Even more important, is the slicer visual selection. As stated before, you should try to stick to default visuals if possible instead of relying upon third-party visuals from the marketplace. However, there are some value adding visualizations in there as well. To provide you with a clear example visual significantly contributing to user experience, the market place’s portfolio includes the so-called ‘HierarchySlicer’. This visual -unsurprisingly- enables users to add multiple fields to one and the same slicer, creating a hierarchy. At the time of report creation, Power BI did not offer the possibility to add multiple fields to their default slicer.. YET. As of Power BI’s February 2020 update, the default slicer now also supports hierarchies. In fact, one of the biggest performance wins I came across existed in simply changing a the third-party ‘HierarchySlicer’ to the Power BI default variant reducing report render time with an astonishing 90%. Continue reading to learn how you can measure performance.

 

8. Overthink your data modeling technique

I’ve had fantastic experiences reporting on a dimensional model. Bottom line, creating report look and feel should only take up a small proportion of your total efforts spent. Designing the right data model will mainly decide whether your reports will behave in a performant way. I am aware there are several ways to design a data model but I can only strongly advise to stick to Kimball’s approach. Other best practices you should keep into account exist in:

> The use of surrogate keys

> The use of role-playing dimensions

> The determination of fact(s), possibly enabling parallel loads

> No snow-flaking

Please do mind I only scratched the surface of dimensional modeling here. Always try to question your decisions in your journey to the ideal data model. Critical thinking is key. In most cases, an additional pair of eyes won’t harm.

9. Split granularity to lower cardinality

To perform calculations one or more requests are sent to the storage engine where the data is stored. This could be either an external relational engine (DirectQuery) or Power BI’s storage engine, called the Vertipaq engine (Import). Due to the variety of DirectQuery connections, I’ll be focusing on the latter. In contrast to a traditional SQL database, which evaluates data row by row, the Vertipaq engine saves data into memory in a columnar format.

With this small introduction to the Vertipaq engine, I want to draw your attention to how Power BI handles its imported data: the higher the cardinality of a column, the longer Power BI will need to perform its calculations upon this data. Put in other words: try to limit the number of distinct values of an attribute.

For example, you could split datetime fields in a date and a time field. Consequently, memory load will be reduced to a minimum as both fields will have a much lower cardinality in comparison with the original datetime field. This is something very case specific but I hope you were able to grasp the idea.

10. Measure performance !

Get a clear view on how a report is currently performing so you can quantitively measure and compare the outcome of any of your taken actions. Power BI has a built-in feature, called the performance analyzer, enabling users to measure how much time is spent on rendering visuals and how much time is spent on executing DAX code. It even gives you the full extract of the DAX code, which you can further analyze in tools as DAX Studio.

Remember the difference between Power BI’s updated default slicer and the HierarchySlicer, imported from the market place? I’ve compared their performance on one of my reports resulting in the following excerpts:

Wrap-Up

The above tips and tricks are some of the actions you could opt for while tuning performance of Power BI reports. I did not have the intention to assemble a complete list. There are several more options but I can only advise to start with the ones I’ve described above. If none of them result in the performance you were hoping for, do not hesitate to get in touch with me or any of my colleagues at Lytix.

All aboard the data train today, CHOO CHOO 🚂 

Lou Segers
Microsoft Analytics Consultant @ Lytics
Follow me on LinkedIn

Governed Corporate Power BI

Best wishes and many Data Insights. 

Oleon, my client, is at the forefront of the growing market of natural Chemistry. As lead of the Analytics team at Oleon it is my desire to provide useful easy-to-read business driven reports.

In a previous article I’ve been pondering on the question SAC vs Power BI ao visualization.

Now, 1,5 year later we are still confident in the combination of Power BI on a 7.5 BW Analytics System and a 2.0 Hana.  And during the holiday season, I have the luxury of being able to look back.

We are using Power BI in the cloud, easily accessible, automatic login with corporate laptops. End-users and Self Service users only need a browser. For handheld devices there is also an app should the user prefer it over a browser.

Learning Curve of Power BI

As with any new tool, there is a learning curve.  For end-users to self-service users this is close to instant. Almost all the questions I get from this side are functional, as it should be. Adding an information page or a link to definitions is a good help here.   

For Power users and developers the plus of Power BI is that ‘the first learning bump’ is easy to take and fills your head with grand dreams and a hunger for more. The second learning bump (programming with DAX and modelling for advanced visuals/calculations is much higher. 

Modelling

Important to know/learn is where to do which coding/modelling/manipulation and to get everybody in the team on the same page.  This takes time and there is no easy answer.

With BW on HANA you can model/code on the source system (usually SAP), in BW layers (the old LSA+, which with HANA you try to minimize to the essential and virtualize the rest), in HANA Calculated Views and finally in Power BI itself. (personal note: I prefer to link Power BI with HANA Views, I find it more stable/mature).

Classic SAP consultants will model too little in Power BI and classic Microsoft consultants vise versa, both making it far more complex then needed.  Both need to understand each other’s strengths for optimal results in the most efficient way.    Myself with a strong SAP background, I had to learn that modelling with master data, next to transaction data in Power BI has benefits in certain scenario’s (filter context, sumx, ..)

This approach requires to shift more logic from the reports (BEX Queries) to the HANA Views.  Only Aggregation depended on logic and logic for visualization is still needed in your reporting layer. Plus side is that this approach makes it easier to switch to a new reporting tool(s) in the (far?) future.

Source systems

Power BI will let you happily combine data coming from all sorts of source systems on premise or not also production and development, so you will need clear guidelines to prevent blunders (reporting dev data on a production system). We of course use ‘import’ method, since we do not use an SQL layer in Azure or other.

A follow up on this is flat files.. Classic SAP approach is to upload your files in an aDSO, transport this set-up to production, and publish this to reporting.. (2 to 4hours to set up) Advantage is that this process is more IT-governed with more fail-safe in case of data entry errors. The alternative is to place flat files onto OneDrive (easier to set up) or a network drive and refresh the data. This set-up is way faster, and typically takes about 5minutes. Refresh of data can also be automated via the Gateway. This second approach can be applied for files that are and will be only relevant for this one report and are maintained by IT.

Visuals

In your first hours of using Power BI you will quickly dive into the marketplace for EXTRA fancy visuals.. only to find out later that the better visuals in there cost money. Much later you discover that these marketplace visuals have caveats.. they lack e.g. language translations, behave different for tooltips, don’t work the first days after a new version .. or other.  In a corporate environment, you will mainly want to stick to the visuals provided by Microsoft.

Governance

Governance is absolutely essential, the playful joy of Power BI almost makes you forget this potentially catapulting your company back to 1980. Governance is a chapter on its own. My wonderful colleague Sander from Lytix wrote a short take on this.

His blog can be read on this link: https://lytix.be/governed-self-service-a-power-bi-rollout/

Final thought

Power BI has a low threshold and still has the option for complex constructs if needed with an excellent distribution platform in the cloud. There is a large user base and a lot of information, youtube videos and many more. Visualisation is the tip of the Analytics iceberg, but it is the part that users see and interact with. This layer has to be and is easy to read/pinpoint opportunities and trouble-free/time saving to use. Inviting users to look more at their data.

Blog by Carl Goossenaerts

Lytix’ XTL Framework

  You’ll often hear a Lytix consultant say that he/she is proud of our XTL Framework. What is it exactly? And how can it improve your business?  

Why do I need a Data Warehouse (DWH) ? 

Data Warehouses support organisations in their decision making by providing data in a well-governed way, this includes:

– Integration of source systems: organisations prefer a mix of solutions to achieve a competitive advantage. Often, large ERP systems (Dynamics AX for Finance and Operations, SAP, Odoo, …) are combined with niche products (SalesForce, Google Analytics, Silverfin, Exact Online, … ) to support their business processes. In an analytical setting, this data also needs to be enriched with open-source APIs (weather APIs, ID verification, … ).

Single source of truth: valuable time can be wasted if numbers cannot be trusted. A DWH provides you one validated catalogue of data.

Easy Managed Access: Data is provided in a way that business users can crunch large quantities of data. This, while still applying security; users should be able to see only what they’re entitled to see (no more, no less).

A foundation for Data Science / AI: Provide a well-structured data repository for your data scientists and minimize their valuable time of cleaning-up and combining sources.

Why do I need the Lytix’ XTL Framework?

Lack of internal engineering experience can result in an unsustainable solution on the long-term. It gets even worse if this person leaves the organisation. Our framework provides unified dimensional models and minimizes ETL engineering. 

 

Advantages of the XTL Framework:

– Unified way of Working: In each of our supported technologies (Azure SQL DB, SnowFlake, DataBricks and Azure Synapse), the main task of engineers is to convert business logic to SQL, a query language that is widely known and which can be relocated to other technologies.

– Maximized logging (with minimal effort during development): Logging modules are automatically wrapped around ETL statements, which reduces a time-intensive burden of developers. Within these modules, all events are captured: number of reads, inserts, updates and deletes, as well as warnings and errors.

– Built for the cloud: The framework is built in such a way it take advantage of cloud benefits; pay-as-you-use, scale-up and scale-down easily.

– Applied Kimball: The modelling techniques are based on best-practices defined by the renowned data-modeller Ralph Kimball (The Data Warehouse Toolkit, Ralph Kimbal & Margy Ross). The most-used and popular Slowly Changing Dimension (SCD) types are supported, such as ‘SCD0: Retain the Original’, ‘SCD1: Overwrite with newer values’, ‘SCD2: Track history by adding new rows’, ‘SCD3: Add new attribute’, ‘SCD6: Add type 1 attributes to a type 2 dimension’ and ‘SCD7: Dual Type 1 and Type 2 Dimensions’. Switching an attribute from SCD Type 1 to SCD Type 2 (… and thus tracking history) is just a matter of a click.

Additional Features

Automated Documentations

The XTL framework is accompanied by a comprehensive report providing you with insights on the ETL processes. This report both provide you standard metric evolutions (number of updates, deletes, failures, …) as well as a dependency view which maps your sources to the targets and shows you the applied transformations.

Dependency Tree

Our framework allows to automatically generate dependency trees based on the metadata of your transformations. Generating the most optimal parallel execution steps within your whole ETL becomes child play. Do you need to refresh only one, two or three tables because you need updated data fast (e.g. for month-closing)? No problem, the dependency tree will only trigger those ETL steps which are directly related to the tables you want updated.  

No ‘All Or Nothing Approach’ in Batches

We often hear following complaint from customers not using our framework “I know one little package failed, but does that mean that the whole dataflow to the business should fail?!

Our framework adresses this issue! The dependency tree allows you to decide what to do upon failures, will you just ignore the failure and continue with your ETL? Or do you stop abruptly to make 100% sure the error doesn’t cause any future problems? This is fully configurable in our framework and it allows you to make a decision per ETL step. 

Data Quality Reporting

Our framework includes a ‘Data Quality’ module. By inputting business rules, the framework will look for inconsistencies and malicious input and report the findings back to the business.

 

Components

Following components are used throughout the framework:

– Azure Data Factory is used as orchestrator and scheduler. This component kicks-off at the requested time (at midnight, once per hour, …) and makes sure all dependencies are initiated in logical order.


– A ‘Storage and Compute Module’ to make the necessary transformations and to provide the data to data consumers; being it business users, data scientists or even mobile devices and webportals. One of four technologies can be implemented as a back-end for this component:
    o Azure SQL Database: A Microsoft product for customers who are already familiar with SQL Servers and in which the Data Warehouse needs lots of integrations.
    o SnowFlake provides both a data lake and strong compute for your analytical needs.
    o DataBricks: if streaming and distributed computing are necessary to build an all-comprehensive Data Warehousing solution. This platform integrates best with organisations that have a large data science team!
   o Azure Synapse (former Azure SQL Data Warehouse) providing large limitless compute for customers with more than 4 TBs of data.


An analytics engine which allows business users to create their own reports, being it Azure Analysis Services or Power BI Pro/Premium. 

Wrap-Up

This article describes one of our proven ways to fuel your data driven organization, covering a multitude of use-cases ranging from traditional reports to data science and this in a governed way.

In need of some help? Do not hesitate to contact Lytix or the author Sander Allert and we/he will be glad to exchange ideas on this subject!