Spark Chapter 12 Spark Streaming with Apache Kafka

Previous blog/Context: In an earlier blog, we discussed Spark ETL with Lakehouse (All the famous lake house formats). Please find below blog post for more details. https://developershome.blog/2023/04/05/spark-etl-chapter-11-with-lakehouse-delta-table-optimization/ Introduction: Today, we will discuss the points below. What is Apache Kafka? Basic concepts of Apache Kafka (Publisher and Subscriber) Publish and subscribe messages from the command line... Continue Reading →

August 2, 2023 1

Delta Lake: An Introduction to a High-Performance Data Management System

End-to-End Lakehouse Implementation using Delta Lake Photo by Jacob Bentzinger on Unsplash Data is one of the most valuable assets for businesses today, but managing and processing large volumes of data can be a complex and challenging task. Traditional data lakes and big data frameworks offer scalable storage and processing capabilities, but they often lack critical... Continue Reading →

May 1, 2023 1

Spark ETL Chapter 8 with Lakehouse | Apache HUDI

Previous blog/Context: In an earlier blog, we discussed Spark ETL with Lakehouse (with Delta Lake). Please find below blog post for more details. https://developershome.blog/2023/03/19/spark-etl-chapter-7-with-lakehouse-delta-lake/embed/#?secret=Z8M19UjerD#?secret=yljQcLJrZC Introduction: In this blog, we will discuss Spark ETL with Apache HUDI. We will first understand what Apache HUDI is and why Apache HUDI is used for creating Lake house. We... Continue Reading →

March 22, 2023 1

Up ↑