Docker Demo
A Demo using Docker containers
Let's use a real world example to see how Hudi works end to end. For this purpose, a self contained data infrastructure is brought up in a local Docker cluster within your computer. It requires the Hudi repo to have been cloned locally.
The steps have been tested on a Mac laptop
Prerequisites
-
Clone the Hudi repository to your local machine.
-
Docker Setup : For Mac, Please follow the steps as defined in Install Docker Desktop on Mac. For running Spark-SQL queries, please ensure atleast 6 GB and 4 CPUs are allocated to Docker (See Docker -> Preferences -> Advanced). Otherwise, spark-SQL queries could be killed because of memory issues.
-
kcat : A command-line utility to publish/consume from kafka topics. Use
brew install kcatto install kcat. -
/etc/hosts : The demo references many services running in container by the hostname. Add the following settings to /etc/hosts
127.0.0.1 adhoc-1
127.0.0.1 adhoc-2
127.0.0.1 namenode
127.0.0.1 datanode1
127.0.0.1 hiveserver
127.0.0.1 hivemetastore
127.0.0.1 kafkabroker
127.0.0.1 sparkmaster
127.0.0.1 zookeeper -
Java : Java SE Development Kit 8.
-
Maven : A build automation tool for Java projects.
-
jq : A lightweight and flexible command-line JSON processor. Use
brew install jqto install jq.
Also, this has not been tested on some environments like Docker on Windows.