Most companies are already familiar with data modelling (be it Kimball or any other modelling technique) and data warehousing with a classical ETL (Extract-Transform-Load) flow. In the age of big data, an increasing number of companies are moving towards a data lake using Spark to store massive amounts of data. However, we often see that […]
Tag: datascience
Pandas, Koalas and PySpark in Python
If you landed on this page to learn more about animals, I have to disappoint you. Pandas, Koalas and PySpark are all packages that serve a similar purpose in the programming language Python. Python has increasingly gained traction over the past years, as illustrated in the Stack Overflow trends. Originally designed as a general purpose […]