spark
Prerequisites
Install JDK on each node, see “Install Java Development Environment”.
Get shell script install_java_bin.
Download the java binary packages:
curl -LO https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3-scala2.13.tgz
Check sum:
$ sha512sum spark-3.5.4-bin-hadoop3-scala2.13.tgz
9691435f42525a34d67564d397fed1a2380d27dcd5bfd309cd77eb8c26e6cc7a3fdfc4c8b8c501bb8740ee36dd8562b3c15e6bba7c75ea12899fbf5136442a91 spark-3.5.4-bin-hadoop3-scala2.13.tgz
Deploy
Install packages on each node:
$ install_java_bin spark spark-3.5.4-bin-hadoop3-scala2.13.tgz /opt
$ sudo chown ubuntu:ubuntu /opt/spark
Configure
Copy file /opt/spark/conf/workers.template to /opt/spark/conf/workers and edit it:
#
# A Spark Worker will be started on each of the machines listed below.
-localhost
+las0
+las1
+las2
Run
Start the Spark cluster on the Master node:
$ /opt/spark/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.master.Master-1-las0.out
las2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-las2.out
las1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-las1.out
las0: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-las0.out
Caution
The hadoop distribution contains scripts with the same name start-all.sh. Do not run the wrong one.
Show java processes:
$ jps -lm
3404602 org.apache.spark.deploy.master.Master --host las0 --port 7077 --webui-port 8080
3406925 sun.tools.jps.Jps -lm
3404796 org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://las0:7077
Spark web UI is available at URL http://las0:8080.
Note
Spark workers’ web UI is binding to prot 8081, which is conflicting with Flink.
To stop the Spark cluster:
$ /opt/spark/sbin/stop-all.sh
las0: stopping org.apache.spark.deploy.worker.Worker
las2: stopping org.apache.spark.deploy.worker.Worker
las1: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
Usage
Submit java programs:
$ spark-submit --master spark://las0:7077 --class org.apache.spark.examples.JavaSparkPi /opt/spark/examples/jars/spark-examples_2.13-3.5.4.jar
⋮
Pi is roughly 3.1409
⋮
Submit python programs:
$ spark-submit --master spark://las0:7077 /opt/spark/examples/src/main/python/pi.py 10
⋮
Pi is roughly 3.133040
⋮