第一次部屬 Hadoop 就上手 – Part 5 – 叢集高可用性(HA)架設
將 Hadoop 叢集設定好並且啟動服務後才可執行
警告: 下列步驟較為複雜,需一定操作能力
* 準備事項:
1. 三台Worker電腦當作 Journalnodes及ZooKeeper
2. 原本的NameNode電腦多新增一個 ResourceManager Stantby
3. 原本的ResourceManager電腦多新增一個 NameNode Stantby
-
請先確認叢集 hdfs 及 yarn 均已停止服務 !!
-
新增hdfs-site.xml檔,並SCP到其他台電腦(hadoop身份)
- SCP使用方式請參考下方範例 …
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>nncluster</value>
</property>
<property>
<name>dfs.ha.namenodes.nncluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nncluster.nn1</name>
<value>test30.example.org:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nncluster.nn1</name>
<value>test30.example.org:9870</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nncluster.nn2</name>
<value>test31.example.org:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nncluster.nn2</name>
<value>test31.example.org:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://test32.example.org:8485;test33.example.org:8485;test34.example.org:8485/nncluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journalnode</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nncluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
- SCP指令範例
scp /usr/local/hadoop/etc/hadoop/hdfs-site.xml hadoop@test31:/usr/local/hadoop/etc/hadoop
- 更正core-site.xml檔,並SCP到其他台電腦(hadoop身份)
nano /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://nncluster</value>
</property>
- 三台電腦(Journalnodes)建立journalnode資料夾(hadoop身份)
mkdir ~/journalnode
- 啟動journalnode,並jps確認(hadoop身份)
hdfs --daemon start journalnode
- active NameNode限定(hadoop身份)
hdfs namenode -initializeSharedEdits
- 請確認有出現Sucessfully started new epoch 1
- 如果此叢集是全新未使用過的請先format一下!!!!!
hdfs namenode -format
- 啟動第一台NameNode(hadoop身份)
hdfs --daemon start namenode
- 第二台NameNode複製metadata(hadoop身份)
hdfs namenode -bootstrapStandby
- 請確認有出現has been successfully formatted
- 啟動第二台NameNode(hadoop身份)
hdfs --daemon start namenode
- 停止全部NameNode再啟動(hadoop身份)
stop-dfs.sh
start-dfs.sh
- 兩台namenode及三台journal node均會一起停止及啟動
- 激活第一台NameNode,並檢查狀態(hadoop身份)
hdfs haadmin -transitionToActive nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
- 啟動Yarn(hadoop身份)
start-yarn.sh
- 啟動Job history server(hadoop身份)
mapred --daemon start historyserver
- 切換一下active Namenode(hadoop身份)
hdfs haadmin -transitionToStandby nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
hdfs haadmin -transitionToActive nn2
hdfs haadmin -getServiceState nn2
- 跑個PI測試一下新起Namenode能不能正常運作(hadoop身份)
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 30 100
-
下載ZooKeeper並安裝(三台Zookeeper電腦都要)(管理者身份)
- 下載ZooKeeper
wget http://ftp.tc.edu.tw/pub/Apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
- 解壓縮
tar -xvf apache-zookeeper-3.5.6-bin.tar.gz -C /usr/local
- 更名
mv /usr/local/apache-zookeeper-3.5.6-bin /usr/local/zookeeper
- 修改擁有者
chown -R hadoop:hadoop /usr/local/zookeeper
-
複製zoo_sample.cfg並編輯zoo.cfg(可以SCP到另外兩台)(hadoop身份)
cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
nano /usr/local/zookeeper/conf/zoo.cfg
dataDir=/usr/local/zookeeper/zoodata #修改
admin.serverPort=8010 #新增
server.1=test32.example.org:2888:3888 #新增
server.2=test33.example.org:2888:3888 #新增
server.3=test34.example.org:2888:3888 #新增
- 修改zkEnv.sh檔(可以SCP到另外兩台)(hadoop身份)
nano /usr/local/zookeeper/bin/zkEnv.sh
#新增
ZOO_LOG_DIR="/usr/local/zookeeper/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
- 建立存放LOG資料夾(hadoop身份)
mkdir /usr/local/zookeeper/zoodata
echo "1" > /usr/local/zookeeper/zoodata/myid #第一台zookeeper做
echo "2" > /usr/local/zookeeper/zoodata/myid #第二台zookeeper做
echo "3" > /usr/local/zookeeper/zoodata/myid #第三台zookeeper做
- myid請務必要與zoo.cfg設定一樣
-
修改環境變數(hadoop身份)
- 編輯.bashrc
nano ~/.bashrc
- 新增環境變數
export ZOOKEEPER_HOME=/usr/local/zookeeper export PATH=$PATH:$ZOOKEEPER_HOME/bin
- 載入環境變數
source ~/.bashrc # . ~/.bashrc
-
啟動ZooKeeper(三台電腦均要啟動)
zkServer.sh start
zkServer.sh status
jps
- 只有一台啟動時候,查看狀態會說It is probably not running.代表目前沒有其他zookeeper溝通
- 依序停止下列服務
#停止Historyserver
mapred --daemon stop historyserver
#停止ResoureManager
stop-yarn.sh
#停止NameNode
stop-dfs.sh
- 新增hdfs-site.xml,並SCP到其他台電腦(hadoop身份)
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<!--新增 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
- 新增core-site.xml,並SCP到其他台電腦(hadoop身份)
nano /usr/local/hadoop/etc/hadoop/core-site.xml
<!--新增 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>master1.example.org:2181,master2.example.org:2181,master3.example.org:2181</value>
</property>
- NameNode限定(hadoop身份)
hdfs zkfc -formatZK
- 請確認出現Successfully created /hadoop-ha/nncluster in ZK字樣
- 啟動NameNode(NameNode限定)(hadoop身份)
start-dfs.sh
- 將會自動啟動DFSZKFailoverController服務
- 測試NameNode故障自動轉移(NameNode限定)(hadoop身份)
hdfs --daemon stop namenode
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
- 新增及刪除yarn-site.xml,並SCP到其他台電腦(hadoop身份)
<!--刪除property -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>test31.example.org</value>
</property>
<!--新增property -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rmcluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>test31.example.org</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>test30.example.org</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>test31.example.org:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>test30.example.org:8088</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>test32.example.org:2181,test33.example.org:2181,test34.example.org:2181</value>
</property>
- 依序啟動下列服務(hadoop身份)
#啟動ResoureManager
start-yarn.sh
#啟動Historyserver
mapred --daemon start historyserver
- 測試ResourceManager故障自動轉移(Resourcemanager限定)(hadoop身份)
yarn --daemon stop resourcemanager
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
- 如果有修改Spark-defaults.conf運行程式載入Jar檔,請記得修訂
nano /usr/local/spark/conf/spark-defaults.conf