Building a notebook-based ETL framework with Spark and Delta Lake

The process of extracting, transforming and loading data from disparate sources (ETL) have become critical in the last few years with the growth of data science applications. In addition, data availability, timeliness, accuracy and consistency are key requirements at the beginning of any data project.

Read More

Analyzing 50 years of Tennis

In this post I will make use of Python’s libraries: pandas, matplotlib and seaborn to analyze data from ATP tennis competitions from the year 1968 up to 2018, including Grand Slams, Masters Series, Masters Cup and International Series competitions.

Read More

Data Guasu

Welcome to Data Guasu! In this blog I will be sharing opinions, ideas, experiences and details of projects I have worked on, and my journey to becoming a data passionist (due to the impostor syndrome I’m having trouble calling myself a data scientist).

Read More

Subscribe to our mailing list