One of the most valuable, yet less used functionalities of Power BI is ‘Paginated Reports’. This article will explain what they are, when you need to use them and how they can benefit your organisation. At the end, some samples can be downloaded that demo the functionality.
Starting-Off with an Example
As you can see in the GIF below, ‘Power BI Paginated Reporting’ provides pixel perfect and printfriendly reporting. There is a large discrepancy with the standard Power BI reports which offer lots of flexibility but are not printfriendly. The way of interacting with filters is fundamentally different, as is the interaction with visuals. Selecting a category in a bar chart does not filter or highlight other visuals in the report.
Pixel-Perfect Reporting Throughout Time
Before the rise of Power BI, Microsoft provided another tool for reporting: ‘SQL Server Reporting Services’ (= SSRS). SSRS provided reports to business users on the IT premises of the customer. Most reports first prompted you for the parameters to run before even showing a result. Once these parameters have been filled in, the report was generated by issuing pre-made queries (= SSRS datasets). Often, the design of these reports was limited to long lists and matrices of numbers and statistics. The typical goal of such reports are mass distribution of a PDF variant or exports to Excel. This pre-made exporting/rendering functionality is often referred to as ‘pixel perfect’. Only a more experienced developer could implement features such as cascading filters, drill through actions and groupings. Nevertheless, these reports always did exactly what was expected of them and allowed great trustworthiness and stability.
Developing a Paginated Report
Creating a paginated report is completely different from creating a Power BI desktop model. This is already demonstrated by the fact that you need to install another software called ‘Power BI Report Builder’. Some terminology might be confusing; a dataset serves a totally different purpose in Power BI desktop than in it does in Paginated Reports. There’s nearly no drag-and-drop experience of creating the perfect dataset and you should know upfront how you want things visualized. Therefore, we often recommend creating a Paginated Report by someone who has experience with SSRS or has a more technical background. Thus, creating Paginated Reports doesn’t provide the same self-service capabilities as standard Power BI reports.
Paginated Reports can only be created if you make use of any of these licenses: – Power BI Premium. – Power BI Embedded, at least an A4 instance. – Power BI Premium per User. Hence, you will not be able to work with Paginated Reports if you only have a Pro license.
Other Tips and Tricks
– If you can’t find the correct documentation or walkthroughs for Paginated Reports, try your luck and add the keyword ‘SSRS’ to it.
– The use cases where Paginated Reports shine are the following: long multi-page lists (such as price lists), pixel-perfect rendering (such as Profit and Loss statements) and mass distribution of PDFs or flat files.
– Paginated Reports use the same extension (.rdl) as SSRS reports. This way SSRS reports can easily be converted into Paginated Reports.
– Better together: You can create links between Power BI and Paginated Reports. As such, you could start on a standard Power BI report and when users click a button, then a Paginated Report can generate a print-friendly report.
Sander Allert is an experienced BI architect with a passion for following new trends. Sander is passionate about data in all of its aspects (Big Data, Data Science, Self-Service BI, Master Data, …) and loves to share his knowledge. Do you need help on architectural decisions, do not hesitate to invite Sander over for a coffee to share some ideas.
Application Lifecycle Management (ALM) refers to a (software) development process which has been setup in a governed and easy-to-manage way. ALM provides added value to the development team, project managers and the business users. While ‘ALM’ is mostly coined by pure software development projects (…written in 100% programming languages), BI projects (which are by nature less programmatic) can also adhere to these high standards. In this blog you will get to know how we as Lytix adhere to ALM standards by using Azure DevOps. You will get to know the benefits and notice why it’s a ‘must-have’ component in BI implementations.
What problems does ALM solve?
When not working in an ALM-compliant manner, you often hear following complaints by business and development:
I’ve accidentally made a change to an object (report, ETL task, …) and I only made a copy three weeks ago… All my intermediate changes have been overwritten!
There are different versions circulating on the different environments; there are structural differences between the reports of the ‘production’ environment and the ‘business sandbox’ environment.
I have no idea what the development team is currently doing, while they should prioritise bug fixing they are probably creating unwanted features.
Why is it so hard to provide me a list of all changes that are made to resolve this problem?!
Even the simplest test have not been carried out by the development team; if the developer had just had a quick look at the data, he would’ve known something was off.
Can you make sure that a key-user tested and validated before you deploy to production?
Manual scripts need to be run to orchestrate the deployment to production. A developer always needs to reserve some spare time to result in a successful deployment, therefore deploying is stressful.
Working in an ALM-compliant way resolves these complaints. Azure DevOps is one of those tools which help you in working in a better process-driven way in your development lifecycle. The functionalities ‘Boards, Pipelines, Repos and Test Plans’ make sure that all ALM aspects are covered.
Azure Boards provides Kanban boards and overviews of all kind of work items. As standard processes, Azure Boards provides the project methodologies ‘Agile’, ‘Scrum’ or ‘CMMI’ (Capability Maturity Model Integration). Depending on the nature of the project (large, medium, small) or the business requirements (needs to follow a certain standard), you can customise these processes until it fits your needs. Manage and plan the availability of your team, plan ahead, budget sprints and prioritise tasks. All this is possible in the Boards module.
E.g. for the development of our XTL Framework, we follow the Agile methodology. Epics, features, user stories, issues and bugs are defined. Based on a sprint planning and available days per consultant (all of which can be managed within Azure Boards), tasks are resolved. As such, a transparent view on the roadmap of our framework is always available.
Repositories, or Repos, store the different version of the code you are working upon (known as version-control). All changes made to your code are historically saved using a combination of branches, merges, commits, pushes and pull requests. A private repository (Git) is used to track the time of change and the user who made the change. As such, for each tiny piece of code, you can retrace which changes have happened per sprint or revert your code to a specific point in time. Linking changes to your code/reports creates a tight integration with your work items and your development. Retrace a code-change to a specific task and get to know the reason for the alteration.
A common misconception is that ‘all Git repositories are open source and thus globally accessible’. While Azure Repos provides a Git repository, this repository is only accessible by users of the project. All users wanting to view your code need to authenticate using their Microsoft account … which often also requires two-factor-authentication. Hence, you can be sure that the code you commit to your Azure Repository is only accessible by those allowed to see it.
Azure Pipelines make sure the written code passes some sanity tests (‘builds’ = making sure your code is consistent) and is then in a structural way deployed to new environments. Pipelines can be configured in such a way that no intervention is required from the BI team upon a deployment to a ‘higher’ environment (e.g. from QA to Production). It also makes sure that multiple environments are always kept in sync with each other.
If one of the stakeholders requires a new environment (e.g. a production-like sandbox for business), then this is just a few clicks on the button.
Given our XTL framework, we use following stages:
The ‘_XTLFramework’ contains a consistent version of the code. A trigger is put in place that always releases to the following phase after a code-commit.
The ‘Build’ phase is used as a no-data sanity environment. This is a first filter implemented to reach a higher quality of code and logic.
At the end of each sprint, a business user accepts a new version of the code (= pre-deployment approval) which triggers an automatic deploy to the Test environment. Technical testers will try-out the functionalities and check if the requirements have been met (= post-deployment approval). Upon approval by the technical tester, the code is automatically deployed to the Quality Acceptance environment.
Either the requester of the change or a key-business user (= functional tester) then evaluates the new functionality on the Quality Acceptance environment. Again, a formal approval of a key-user is required to continue to the next stage. Upon acceptance, code is automatically deployed to the ‘Business Evaluation’ environment and ‘Business Sandbox’ environment. The code is not automatically deployed to production.
A ‘pre-deployment’ approval of the BI product owner is required before this version of the code is deployed to production. After acceptance, the release has fully reached all environments and is in sync.
In the setup above, no developer or technical interventions need to happen once the code has been committed. Every phase is pushed by the owner of that specific stage.
While throughout this article we’ve mainly provided examples of our XTL Framework, we configure CI/CD pipelines for all kind of software and tools:
Business Applications: PowerApps, Power BI, Microsoft Automate/PowerFlow, …
On-Premise Applications: SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), On-Premise SQL Server Databases, …
To demonstrate that newly developed code has been tested and proven of being of high quality, the ‘Test Plans’ module is used. These tests can either be executed manually or automatically. A pop-up shows testers the tests they should perform and allows them to easily log deviations from the expected process. All the test results can be accessed and are linked to sprints, work items and branches, thus enabeling 360° traceability of the improvements of your BI solution.
Good to Knows & Pricing
Most functionalities are free up to the first five ‘contributors’ (= people using DevOps that require more elevated rights than just viewing). As we like to keep our teams lean-and-mean, small to medium projects can often freely use this service. If you plan on using the ‘Test Plans’ then it is good to know that this requires a more costly license plan. Do you happen to have a Visual Studio Enterprise Subscription? Then it is automatically included in the license. Next to that, Azure DevOps is easily accessible by other tools. Easily export and manage your own data using Power BI, PowerApps and Power Automate.
Taking into account all the above functionalities, the low pricing and the perfect integration; Lytix is convinced that the Azure DevOps Suite provides added value in BI projects.
Author: Sander Allert
Interested in a demo or a hands-on session? Wanting to talk to an architect? Send a mail to the author of this article and get in contact: email@example.com
‘Cloud’ or ‘Clowned’? With the high adoption of cloud computing in all processes, ‘Analytics’ is the one that can benefit from it the most. Analytics as being it a data warehouse that is used for reporting or ad-hoc analysis, Data Lake which serves as a repository for data that contains added value and Big Data in which one server isn’t sufficient to meet your analytical needs. Typical to analytics is combining different sources, a high variation in its use (at some days it is used highly intensive, while at others it is not consulted) and the need for lots of computational power.
As a Cloud PaaS solution Snowflake is taking the Data world by storm. This tool has a lot of benefits but is it really as good as it claims to be? Can it address the challenges listed above? This blog post will go through the most prominent advantages and disadvantages of Snowflake.
But… what is PaaS?
Before we dive into the nitty-gritty of Snowflake it is important we define what a ‘Platform as a Service’ (PaaS), is.
As the name describes, when a platform-based service is offered to be used by customers where they can execute the required developments without having to take care of the underlying infrastructure. This infrastructure is provided and maintained by the owner of the PAAS Solution.
So, everything is set for me to develop! What are the limitations then?
Well, exactly as you say. Everything is set up
for you so you cannot change anything in the infrastructure. Any limitations
that the Solution has, you will need to either live with or hope that it is
solved by the provider.
This is an issue you don’t have when it is
developed by yourself. If you need some additional infrastructure developments
or changes you can plan and execute it yourself.
It is always best to look at what the PAAS solution can offer you and if it ticks all the boxes you need.
Where does snowflake fit in here?
Snowflake offers a lot of capabilities you can use during your journey through data without having the issue of taking care of your environment. All the technical set-up is done for you, hence it is PaaS.
What makes snowflake so special besides being PaaS?
We will look into a few aspects that make
One of the main selling points of Snowflake is the fact that is it fully cloud-based and not an on-premises system made available in the cloud.
It’s available every major cloud service on the
market, meaning Azure, AWS and Google cloud. On top of this Snowflake maintains
a Cross-Cloud strategy.
This strategy ensures the following:
possibility to negotiate the best cloud environment without having any
case your chosen cloud provider faces an outage you can simply switch to
another cloud provider and ensure uptime of your systems.
Data Storage Capacity
You often hear data is
the new currency. Nothing could be more true in the business world. Companies
keep collecting data and so their volumes keep growing.
Luckily Snowflake has no
limit on their data storage. As the storage and compute layers are separated,
you can just dump all relevant business data into SnowFlake which will then
treat it as a Data Lake (= read: cheap storage and an infinite storage amount).
Of course you might not need a large amount now, but during the years to come you expect your data volumes to increase. If you are provided with a large volume of value-added data, then you do not need to hesitate and can directly add it to SnowFlake. All data residing within SnowFlake (= your data lake) is directly accessible.
feature based on Data Storage is the fact that if you copy a database, it will
not take double the space. It will recognize the records that are the same and
it will only require space for different or new records.
Snowflake often uses the phrase ‘Pay for what
you need’ which is exactly what the dynamic scaling is for. This is best
explained with an example;
Imagine you run your daily load in the morning between 3 AM and 4 AM. At the start of the working day, a lot of users are interested in the latest insights which are available within SnowFlake (let’s say: between 8 AM and 10 AM). For the rest of the day, only one super users sends maybe one query now and then.
In more traditional cost-effective Cloud scenario’s
(= not SnowFlake), you would start-up your BI system only for the moments which
it is planned to use (= 3 AM and 4 AM, and 8 AM and 18 AM). Your BI environment
takes a while to spin up and spin down.
SnowFlake doesn’t incur a delay in ‘spinning-up and down’ and is thus directly accessible, even if no cluster is running. We often configure it in such a way that it is immediately spins-up whenever someone sends a query and automatically spins down after one minute of inactivity. Paying only when compute is being used, will save lots of money for your organization!
“SnowFlake directly spins-up when users issue queries at the database, even at an idle state. In addition, SnowFlake is automatically paused after one minute of inactivity. This combination saves a high amount of infrastructural costs which would otherwise be billed.”
Instant Adjustable Processing
One of the most interesting features is the ability to change the size of your warehouse. The bigger the size, the faster big queries can be executed. This can once again be changed on the fly depending on the needs of the organization at that point in time. SnowFlake uses T-Shirt sizes (XS, S, M, L, XL, XXL, XXXL) and each size is double the size of the previous (= Medium equals twice the processing power of Small).
It is very important to
optimize your query usage as much as possible since you are paying per usage.
Luckily Snowflake has this covered for you!
If you execute a query
now and the data doesn’t change it will be cached in Snowflake’s engine.
This means that the second time you
execute the query, it will not cost you anything as you will not be querying
your data directly but the cached version.
quite a lot of features! Are there any others?
are. We won’t be going in detail about all of them but here are some smaller,
but also interesting features.
Time-Travel Feature: this is the ability to view data from the past.
Copy-Clone: You can Copy an existing database that is directly available for use.
You can Query directly on JSON, XML… files.
Active Directory Integration
there any aspects you would say Snowflake is not suited for?
You must keep in mind Snowflake in its core designed for handling Big Data, Data Warehouses and so on. If your interest/use case lies in Data Science we would suggest you look at tools built with this in mind. We would suggest DataBricks and Spark. More info regarding DataBricks can be found on our blog here.
interesting! How can I get in touch?
If you need
help (development, architecture, strategy, …) on one of your BI implementations,
then Lytix will help you with this. Lytix believes in the SnowFlake platform
and is an expert in the advantages and disadvantages it has. Our ETL framework ‘XTL’
uses SnowFlake as a database engine and can provide you with a head-start in an
The title accompanying this blog post is quite the mouth full. This blog post will explain why you should be using Spark. If a use case would make sense, then we will introduce you to the DataBricks product, which is available on Azure. Being recognised as a Leader in the Magic Quadrant, emphasizes the operational use and vision of DataBricks.
If you would like to know more of DataBricks, Lytix and Cubis are providing a free expert session on Wednesday the 4th of March 2020 (in ‘De Kantien’ in Ghent, starting at 18:00) where we will provide you with demo’s to get started with DataBricks on Azure. It doesn’t matter if you’re a business analist, a C-level, manager or a student; feel free to subscribe on this intro course by using following link: http://bit.ly/DataBrytix
If you’ve been around for several years in the ‘Data and Analytics’ domain, you’ll probably remember the hype on ‘Big Data’. The most common scenario where companies switch to Big Data is if they do not have the analytical power to perform a workload on one (and only one) server. Such a workload could be ‘categorising your documents by using Natural Language Processing’, ‘creating segmentations on all website visitors’, ‘training a very accurate prediction model’, … In the most used sense of the word ‘Big Data’, processing is done by in a distributed way, i.e. by coordinating several machines at once. Spark is an open-source analytical engine which allow technical users to setup a distributed system, thus allowing companies to tackle their Big Data projects. By default, Spark also gives you the ability to capture streaming events, provides a set of machine learning algorithms and allows for working with graph databases.
We now know Spark, what is DataBricks?
While Spark is great at what it does, it is hard to maintain and configure, hard to spin up and spin down, hard to add servers to your cluster and remove them. DataBricks addresses this problem and provides ‘Spark as a Service’ while also adding enterprise-required features. As the majority of the DataBricks product team has also created the core of Spark, they also made API and performance improvements to the analytical engine they provide you with. As such, we believe that DataBricks is the most enterprise-ready Big Data and Data Science platform.
Get started in minutes instead of days:
Setting-up a databricks custer with several nodes is easy. The wizard ‘New Cluster’ lets you pick your cluster size and what kind of compute it requires: RAM? GPU? Delta Optimized?
The feature ‘Autoscaling’ allows DataBricks to add-to or reduce the amount of workers within your cluster based on your workload. If the queries sent to your cluster require large amounts of processing power, then other servers will be added to your cluster so you’ll be provided the results faster!
Delta: ACID Transactions on Data Lakes
An absolute game change that DataBricks has brought to the market, is Delta (Lake). Data lakes are (often) a large collection of files, files which are ‘immutable’ and thus cannot be changed. Delta, on the other hand, enforces ACID properties on your Data Lake. Adding ACID properties (being Atomicity, Consistency, Isolation and Durability) to a data lake allows for other analytical scenario’s that involve inserts, updates and deletes.
The DataBricks Notebook Experience
If you haven’t worked with analytical notebooks (Jupyter, Azure Data Studio or DataBricks), you’re missing out! Being able to write documentation and code within the same document is a big step forward. It helps you make clear to the readers why you created the queries and guiding them in their first steps in analytics. As opposed to other notebooks, DataBricks can connect to version control (Git, TFS, …) and allows you to combine both R, Python, SQL and Scala in the same notebook.
Managing your Data Engineers and Data Scientist
DataBricks allows applying security on folders and workbooks using Azure Active Directory; workbooks which contain sensitive data can only be seen by a specific security group. If you combine this with Azure Data Lake Storage Gen2, which allows applying security on data-folders, you’ll have an enterprise-ready data science environment.
Why shouldn’t I use DataBricks?
Delta Lake will not support operational processes like any OLTP system due to its limited concurrency to a single file. Thus, we as Lytix and Cubis, often use it for Data Warehousing scenario’s which require little concurrency. As your data is divided over several partitions across many servers and results of queries/calculations should pass through your head node, do not expect that distributed computing will provide you with a snappy behaviour! The technology is used for analytical workloads on large amounts of data and will backfire if you would use it any other way.
Our XTL Framework
In addition to all features DataBricks provides you with, Lytix/Cubis has a framework which simplifies development and implements best-practices. This framework uses Azure Data Factory as orchestrator and as a result provides you with a data layer that is easily accessible by any reporting tool (Power BI, SAP Analytics Cloud, …). As our logging components are automatically wrapped around business transformations, time spend on data engineering is significantly reduced. Using ‘Delta’, we even provide the ‘Data Warehouse on Spark/DataBricks’ solution.
If you want help/guidance on your DataBricks or Azure Platform implementation, do not hesitate to reach out.