Spark ETL Chapter 9 with Lakehouse | Apache Iceberg

Previous blog/Context: In an earlier blog, we discussed Spark ETL with Lakehouse (with HUDI). Please find below blog post for more details. https://developershome.blog/2023/03/22/spark-etl-chapter-8-with-lakehouse-apache-hudi/ Introduction: In this blog, we will discuss Spark ETL with Apache iceberg. We will first understand what Apache iceberg is and why use Apache iceberg for creating Lake house. We will source... Continue Reading →

Data Engineering Problem 3 (Find diff between count of cities and distinct count of cities)

Please find earlier blogs to have understanding of our Data Engineering Learning plan and system setup for Data Engineering. Today we are solving and learning one more Data engineering problem and learning new concepts. https://www.youtube.com/watch?v=c6YgtJN43sU&feature=youtu.be Problem Statement Find the difference between the total number of CITY entries in the table and the number of distinct... Continue Reading →

Deploy Spark Using Docker

In this blog, we will learn how to deploy spark using Docker. We will use this environment in future for learning spark and solving data engineering problems. With Spark, we also want to install a few of the packages (connectors) so that we don't need to install them separately. When we will learn Spark, we... Continue Reading →

Create a website or blog at WordPress.com

Up ↑