第一次部屬 Hadoop 就上手 – Part 2 – Hadoop 安裝及設定
Hadoop叢集基礎架設範例
-
要準備的事項
- 選定架設機器(本範例使用 5台 , ip 30-34)
-
本範例使用 Ubuntu 18.04 ,並使用 VMware 方便克隆多台電腦
-
資源配置如下:
1台機器 for NameNode 1台機器 for ResourceManager 3台機器 for Worker 每台機器資源均為: Cpu : 4 core Ram : 8 G
承上 part 1 , 將基本環境配置好後,開始執行 Hadoop 安裝及設定
-
下載及安裝hadoop(管理者身份)
- 下載
cd wget http://ftp.tc.edu.tw/pub/Apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
- 如果載點失效,請至官網下載~~
- 解壓縮
tar -tvf hadoop-3.2.1.tar.gz #查看一下檔案內容 tar -xvf hadoop-3.2.1.tar.gz -C /usr/local
- 更名
mv /usr/local/hadoop-3.2.1 /usr/local/hadoop
- 改變資料夾及檔案擁有者
chown -R hadoop:hadoop /usr/local/hadoop
-
設定hadoop使用者環境變數 (Hadoop身份)
- 設定.bashrc
nano ~/.bashrc
# Set HADOOP_HOME export HADOOP_HOME=/usr/local/hadoop # Set HADOOP_MAPRED_HOME export HADOOP_MAPRED_HOME=${HADOOP_HOME} # Add Hadoop bin and sbin directory to PATH export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- 重新載入設定檔
source ~/.bashrc # . .bashrc
- 查看環境變數
- 更改 Hadoop運行程式時環境腳本(Hadoop身份)
nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
- 更改 Hadoop core-site.xml(Hadoop身份)
nano /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data</value>
<description>Temporary Directory.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://test30.example.org</value>
<description>Use HDFS as file storage engine</description>
</property>
- Hadoop 3.2.0版之後有檢查語法指令
hadoop conftest
- 參考 hortonworks 各組件參數配置建議
- 更改 Hadoop mapred-site.xml(Hadoop身份)
nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3276m</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx3276m</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>819</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>test32.example.org:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>test32.example.org:19888</value>
</property>
- 更改 Hadoop yarn-site.xml(Hadoop身份)
nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>6144</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>6144</value>
</property>
<property>
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>test31.example.org</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
- 更改Hadoop hdfs-site.xml(Hadoop身份)
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
<description>The name of the group of super-users. The value should be a single group name.</description>
</property>
- 建立Hadoop worker檔(管理者身份)
nano /usr/local/hadoop/etc/hadoop/workers