Introduction: As we discussed earlier, we will start solving Data Engineering problems using SQL (PostgreSQL and MySQL), NoSQL (MongoDB or Cassandra) and Apache Spark (PySpark and Spark SQL) We will start from very easy SQL problems to difficult SQL Problems, we will also solve problems regarding data loads (Batch, replication and Streaming). Problem Statement We... Continue Reading →
Data Engineering Problem 0 (Employees with salary more than 100K)
Introduction: As we discussed earlier, we will start solving Data Engineering problems using SQL (PostgreSQL and MySQL), NoSQL (MongoDB or Cassandra) and Apache Spark (PySpark and Spark SQL) We will start from very easy SQL problems to difficult SQL Problems, we will also solve problems regarding data loads (Batch, replication and Streaming). Please find our... Continue Reading →
Data Engineering Tool Suite
Introduction In this blog post we are setting up Data Engineering tools set on our local environment using docker. For Data Engineering tool suite for now we are considering below tools on initial level. In the coming future, we will update our docker files and add more tools. Apache Spark Jupyter Lab Package for Delta... Continue Reading →