Lytix’ XTL Framework
You’ll often hear a Lytix consultant say that he/she is proud of our XTL Framework. What is it exactly? And how can it improve your business?
Why do I need a Data Warehouse (DWH) ?
Data Warehouses support organisations in their decision making by providing data in a well-governed way, this includes:
– Integration of source systems: organisations prefer a mix of solutions to achieve a competitive advantage. Often, large ERP systems (Dynamics AX for Finance and Operations, SAP, Odoo, …) are combined with niche products (SalesForce, Google Analytics, Silverfin, Exact Online, … ) to support their business processes. In an analytical setting, this data also needs to be enriched with open-source APIs (weather APIs, ID verification, … ).
– Single source of truth: valuable time can be wasted if numbers cannot be trusted. A DWH provides you one validated catalogue of data.
– Easy Managed Access: Data is provided in a way that business users can crunch large quantities of data. This, while still applying security; users should be able to see only what they’re entitled to see (no more, no less).
– A foundation for Data Science / AI: Provide a well-structured data repository for your data scientists and minimize their valuable time of cleaning-up and combining sources.
Why do I need the Lytix’ XTL Framework?
Lack of internal engineering experience can result in an unsustainable solution on the long-term. It gets even worse if this person leaves the organisation. Our framework provides unified dimensional models and minimizes ETL engineering.
Advantages of the XTL Framework:
– Unified way of Working: In each of our supported technologies (Azure SQL DB, SnowFlake, DataBricks and Azure Synapse), the main task of engineers is to convert business logic to SQL, a query language that is widely known and which can be relocated to other technologies.
– Maximized logging (with minimal effort during development): Logging modules are automatically wrapped around ETL statements, which reduces a time-intensive burden of developers. Within these modules, all events are captured: number of reads, inserts, updates and deletes, as well as warnings and errors.
– Built for the cloud: The framework is built in such a way it take advantage of cloud benefits; pay-as-you-use, scale-up and scale-down easily.
– Applied Kimball: The modelling techniques are based on best-practices defined by the renowned data-modeller Ralph Kimball (The Data Warehouse Toolkit, Ralph Kimbal & Margy Ross). The most-used and popular Slowly Changing Dimension (SCD) types are supported, such as ‘SCD0: Retain the Original’, ‘SCD1: Overwrite with newer values’, ‘SCD2: Track history by adding new rows’, ‘SCD3: Add new attribute’, ‘SCD6: Add type 1 attributes to a type 2 dimension’ and ‘SCD7: Dual Type 1 and Type 2 Dimensions’. Switching an attribute from SCD Type 1 to SCD Type 2 (… and thus tracking history) is just a matter of a click.
The XTL framework is accompanied by a comprehensive report providing you with insights on the ETL processes. This report both provide you standard metric evolutions (number of updates, deletes, failures, …) as well as a dependency view which maps your sources to the targets and shows you the applied transformations.
Our framework allows to automatically generate dependency trees based on the metadata of your transformations. Generating the most optimal parallel execution steps within your whole ETL becomes child play. Do you need to refresh only one, two or three tables because you need updated data fast (e.g. for month-closing)? No problem, the dependency tree will only trigger those ETL steps which are directly related to the tables you want updated.
No ‘All Or Nothing Approach’ in Batches
We often hear following complaint from customers not using our framework “I know one little package failed, but does that mean that the whole dataflow to the business should fail?!“
Our framework adresses this issue! The dependency tree allows you to decide what to do upon failures, will you just ignore the failure and continue with your ETL? Or do you stop abruptly to make 100% sure the error doesn’t cause any future problems? This is fully configurable in our framework and it allows you to make a decision per ETL step.
Data Quality Reporting
Our framework includes a ‘Data Quality’ module. By inputting business rules, the framework will look for inconsistencies and malicious input and report the findings back to the business.
Following components are used throughout the framework:
– Azure Data Factory is used as orchestrator and scheduler. This component kicks-off at the requested time (at midnight, once per hour, …) and makes sure all dependencies are initiated in logical order.
– A ‘Storage and Compute Module’ to make the necessary transformations and to provide the data to data consumers; being it business users, data scientists or even mobile devices and webportals. One of four technologies can be implemented as a back-end for this component:
o Azure SQL Database: A Microsoft product for customers who are already familiar with SQL Servers and in which the Data Warehouse needs lots of integrations.
o SnowFlake provides both a data lake and strong compute for your analytical needs.
o DataBricks: if streaming and distributed computing are necessary to build an all-comprehensive Data Warehousing solution. This platform integrates best with organisations that have a large data science team!
o Azure Synapse (former Azure SQL Data Warehouse) providing large limitless compute for customers with more than 4 TBs of data.
– An analytics engine which allows business users to create their own reports, being it Azure Analysis Services or Power BI Pro/Premium.
This article describes one of our proven ways to fuel your data driven organization, covering a multitude of use-cases ranging from traditional reports to data science and this in a governed way.
In need of some help? Do not hesitate to contact Lytix or the author Sander Allert and we/he will be glad to exchange ideas on this subject!
Sander Allert is an experienced BI architect with a passion for following new trends. Sander is passionate about data in all of its aspects (Big Data, Data Science, Self-Service BI, Master Data, …) and loves to share his knowledge. Do you need help on architectural decisions, do not hesitate to invite Sander over for a coffee to share some ideas.