Introduction:
Here, we will discuss what we are going to explore/learn in the next few days. Also, we will discuss what all tools need to be installed/configure for learning or practicing Data Engineering key concepts.
Please find below a list of technologies which we are planning to learn in the next few days. (Please also have a look on this blog Big Data Engineering Skills)
Note: I will update links with each blog series once that is available (For now, I am planning to write and prepare 2 to 3 videos in a week)
- Understanding of Docker (Docker blog series available here: Docker Blog Series)
- Understanding of SQL
- By solving SQL problems and understating key concepts of SQL
- Understanding of NoSQL Database
- MongoDB
- Learning Spark from the Scratch (We will also discuss python as we will be focusing more on PySpark)
- Solving SQL problems with Spark (PySpark)
- Apache Hive (We will also use Hive for solving problems)
- Understanding of Data Warehousing and open-source tools related to it. (Apache Impala, and Apache Pig)
- Apache Hive (We will use this with Apache Spark also)
- Apache Impala
- Apache Pig
- Data Services in Cloud (Azure, AWS and GCP)
- Azure Data Services (And also Synapse)
- AWS Data Services (And also RedShift)
- GCP Data Services (And also Big Query)
- Snowflake
System Setup for Data Engineering
These are the tools, we will need to practice Data Engineering problems and learn new concepts
- Docker Desktop
- Link for Download
- We will use Docker for downloading all the platforms, like Apache spark, PostgreSQL, MySQL, Apache Kafka and others when required.
- Visual Studio Code
- Link for download
- We will install plugins in VS code like Jupyter notebook, GitHub, docker and others when required.
- pgAdmin
- Link for download
- We will use this for connecting to PostgreSQL and accessing all the database functionality.
- MySQL Workbench
- Link for download
- We will use this for connecting to MySQL and accessing all the database functionality.
- Mongo Compass
- Link for download
- We will use this for connecting to MongoDB and accessing all the database functionality.
After installing all these software, we will install Spark, MySQL and PostgreSQL. Please follow next blog for the same.
Please use below blog for installing Apache Spark using docker
Use below blog for installing Data Engineering Tool Suite