Previous blog/Context: In an earlier blog, we discussed Spark ETL with Cloud data lakes (AWS S3 bucket). Please find below blog post for more details. https://developershome.blog/2023/03/12/spark-etl-chapter-4-with-cloud-data-lakes-aws-s3-bucket/ Introduction: In this blog, we will discuss HIVE tables/views and we will do ETL with Hive tables. We will learn about how to create global and temporary hive tables... Continue Reading →
Spark ETL Chapter 2 with NoSQL Database (MongoDB | Cassandra)
Previous blog/Context: In an earlier blog, we discussed Spark ETL with SQL Databases (MySQL and PostgreSQL Database). Please find below blog post for more details. https://developershome.blog/2023/03/06/spark-etl-with-sql-databases-mysql-postgresql/ Introduction: In this blog, we will discuss Spark ETL with NoSQL database, and we are considering MongoDB and we will do all the Spark ETL with MongoDB database. All... Continue Reading →
Data Engineering Tool Suite
Introduction In this blog post we are setting up Data Engineering tools set on our local environment using docker. For Data Engineering tool suite for now we are considering below tools on initial level. In the coming future, we will update our docker files and add more tools. Apache Spark Jupyter Lab Package for Delta... Continue Reading →