Data Engineering Problem 0 (Employees with salary more than 100K)

Introduction: As we discussed earlier, we will start solving Data Engineering problems using SQL (PostgreSQL and MySQL), NoSQL (MongoDB or Cassandra) and Apache Spark (PySpark and Spark SQL) We will start from very easy SQL problems to difficult SQL Problems, we will also solve problems regarding data loads (Batch, replication and Streaming). Please find our... Continue Reading →

February 5, 2023 2

Data Engineering Tool Suite

Introduction In this blog post we are setting up Data Engineering tools set on our local environment using docker. For Data Engineering tool suite for now we are considering below tools on initial level. In the coming future, we will update our docker files and add more tools. Apache Spark Jupyter Lab Package for Delta... Continue Reading →

January 30, 2023 5

Deploy Spark Using Docker

In this blog, we will learn how to deploy spark using Docker. We will use this environment in future for learning spark and solving data engineering problems. With Spark, we also want to install a few of the packages (connectors) so that we don't need to install them separately. When we will learn Spark, we... Continue Reading →

January 29, 2023 4

Up ↑