# spark ## Prerequisites Install JDK on each node, see "". Get shell script [`install_java_bin`](https://github.com/lasyard/coding/blob/main/shell/install_java_bin.sh). Download the java binary packages: ```console curl -LO https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3-scala2.13.tgz ``` Check sum: ```console $ sha512sum spark-3.5.4-bin-hadoop3-scala2.13.tgz 9691435f42525a34d67564d397fed1a2380d27dcd5bfd309cd77eb8c26e6cc7a3fdfc4c8b8c501bb8740ee36dd8562b3c15e6bba7c75ea12899fbf5136442a91 spark-3.5.4-bin-hadoop3-scala2.13.tgz ``` ## Deploy Install packages on each node: ```console $ install_java_bin spark spark-3.5.4-bin-hadoop3-scala2.13.tgz /opt $ sudo chown ubuntu:ubuntu /opt/spark ``` ### Configure Copy file `/opt/spark/conf/workers.template` to `/opt/spark/conf/workers` and edit it: :::{literalinclude} /_files/ubuntu/opt/spark/conf/workers :diff: /_files/ubuntu/opt/spark/conf/workers.orig ::: ### Run Start the Spark cluster on the Master node: ```console $ /opt/spark/sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.master.Master-1-las0.out las2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-las2.out las1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-las1.out las0: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-las0.out ``` :::{caution} The `hadoop` distribution contains scripts with the same name `start-all.sh`. Do not run the wrong one. ::: Show java processes: ```console $ jps -lm 3404602 org.apache.spark.deploy.master.Master --host las0 --port 7077 --webui-port 8080 3406925 sun.tools.jps.Jps -lm 3404796 org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://las0:7077 ``` Spark web UI is available at URL `http://las0:8080`. :::{note} Spark workers' web UI is binding to prot `8081`, which is conflicting with [Flink](project:flink.md). ::: To stop the Spark cluster: ```console $ /opt/spark/sbin/stop-all.sh las0: stopping org.apache.spark.deploy.worker.Worker las2: stopping org.apache.spark.deploy.worker.Worker las1: stopping org.apache.spark.deploy.worker.Worker stopping org.apache.spark.deploy.master.Master ``` ## Usage Submit java programs: ```console $ spark-submit --master spark://las0:7077 --class org.apache.spark.examples.JavaSparkPi /opt/spark/examples/jars/spark-examples_2.13-3.5.4.jar ... Pi is roughly 3.1409 ... ``` Submit python programs: ```console $ spark-submit --master spark://las0:7077 /opt/spark/examples/src/main/python/pi.py 10 ... Pi is roughly 3.133040 ... ```