# hadoop
## Prerequisites
Install JDK on each node, see "".
Get shell script [`install_java_bin`](https://github.com/lasyard/coding/blob/main/shell/install_java_bin.sh).
Download the java binary packages:
```console
$ curl -LO https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
```
Check sum:
```console
$ sha512sum hadoop-3.4.1.tar.gz
09cda6943625bc8e4307deca7a4df76d676a51aca1b9a0171938b793521dfe1ab5970fdb9a490bab34c12a2230ffdaed2992bad16458169ac51b281be1ab6741 hadoop-3.4.1.tar.gz
```
Set password-free login to all workers (even it is the same node where the commands are emitted) for the user used (in this case, it is `ubuntu`).
## Deploy
Install the java packages on each node:
```console
$ install_java_bin hadoop hadoop-3.4.1.tar.gz /opt
$ sudo chown ubuntu:ubuntu /opt/hadoop
```
Set environment variables on each node:
```console
$ echo "export HADOOP_HOME=\"/opt/hadoop\"" | sudo tee -a /etc/profile.d/hadoop.sh
$ echo "export HADOOP_CLASSPATH=\"\$(\${HADOOP_HOME}/bin/hadoop classpath)\"" | sudo tee -a /etc/profile.d/hadoop.sh
```
### Configure
Edit file `/opt/hadoop/etc/hadoop/hadoop-env.sh`:
:::{literalinclude} /_files/ubuntu/opt/hadoop/etc/hadoop/hadoop-env.sh
:diff: /_files/ubuntu/opt/hadoop/etc/hadoop/hadoop-env.sh.orig
:::
Edit file `/opt/hadoop/etc/hadoop/core-site.xml`:
:::{literalinclude} /_files/ubuntu/opt/hadoop/etc/hadoop/core-site.xml
:diff: /_files/ubuntu/opt/hadoop/etc/hadoop/core-site.xml.orig
:::
These files need to be copied to all nodes to the same path.
Create the directory for `${hadoop.tmp.dir}` on each node:
```console
$ sudo mkdir -p /opt/tmp/hadoop
$ sudo chown ubuntu:ubuntu /opt/tmp/hadoop
```
Edit file `/opt/hadoop/etc/hadoop/workers`:
:::{literalinclude} /_files/ubuntu/opt/hadoop/etc/hadoop/workers
:diff: /_files/ubuntu/opt/hadoop/etc/hadoop/workers.orig
:::
#### Configure hdfs
Edit file `/opt/hadoop/etc/hadoop/hdfs-site.xml`:
:::{literalinclude} /_files/ubuntu/opt/hadoop/etc/hadoop/hdfs-site.xml
:diff: /_files/ubuntu/opt/hadoop/etc/hadoop/hdfs-site.xml.orig
:::
This file need to be copied to all nodes to the same path.
#### Configure yarn
Edit file `/opt/hadoop/etc/hadoop/yarn-site.xml`:
:::{literalinclude} /_files/ubuntu/opt/hadoop/etc/hadoop/yarn-site.xml
:diff: /_files/ubuntu/opt/hadoop/etc/hadoop/yarn-site.xml.orig
:::
Edit file `/opt/hadoop/etc/hadoop/mapred-site.xml`:
:::{literalinclude} /_files/ubuntu/opt/hadoop/etc/hadoop/mapred-site.xml
:diff: /_files/ubuntu/opt/hadoop/etc/hadoop/mapred-site.xml.orig
:::
These files need to be copied to all nodes to the same path.
### Run
Check the version:
```console
$ hadoop version
Hadoop 3.4.1
Source code repository https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1
Compiled by mthakur on 2024-10-09T14:57Z
Compiled on platform linux-x86_64
Compiled with protoc 3.23.4
From source with checksum 7292fe9dba5e2e44e3a9f763fce3e680
This command was run using /opt/hadoop-3.4.1/share/hadoop/common/hadoop-common-3.4.1.jar
```
Init hdfs:
```console
$ hdfs namenode -format
...
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at las0/10.225.4.51
************************************************************/
```
Start hdfs:
```console
$ start-dfs.sh
Starting namenodes on [las0]
Starting datanodes
Starting secondary namenodes [las0]
```
Start yarn:
```console
$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
```
Show java processes:
```console
$ jps -lm
2509842 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
2510720 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
2509278 org.apache.hadoop.hdfs.server.namenode.NameNode
2511597 sun.tools.jps.Jps -lm
2509499 org.apache.hadoop.hdfs.server.datanode.DataNode
2510937 org.apache.hadoop.yarn.server.nodemanager.NodeManager
```
Stop them:
```console
$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
$ stop-dfs.sh
Stopping namenodes on [las0]
Stopping datanodes
Stopping secondary namenodes [las0]
```
## Usage
### hdfs
```console
$ hdfs dfs -ls /
$ hdfs dfs -mkdir -p /user/ubuntu
$ echo 'Hello world!' > file.dat
$ hdfs dfs -put file.dat
$ hdfs dfs -cat file.dat
Hello world!
$ hdfs dfs -rm file.dat
Deleted file.dat
```
The hadoop web UI is available at `http://las0:9870`.
#### Safe mode
Show current safe mode status:
```console
$ hdfs dfsadmin -safemode get
Safe mode is OFF
```
Enter safe mode:
```console
$ hdfs dfsadmin -safemode enter
Safe mode is ON
```
Leave safe mode:
```console
$ hdfs dfsadmin -safemode leave
Safe mode is OFF
```
:::{note}
A freshly started/restarted NameNode is in safe mode temporarily. It will leave safe mode automatically.
:::
#### Clear all data
If you want to clear the hdfs data, stop hdfs and run the following commands on each node:
```console
$ rm -rf /opt/tmp/hadoop/dfs/*
```
### yarn
The yarn web UI is available at `http://las0:8088` or `http://las0:8088/ui2/`.