第一次部屬 Hadoop 就上手 – Part 5 – 叢集高可用性(HA)架設

將 Hadoop 叢集設定好並且啟動服務後才可執行

警告: 下列步驟較為複雜,需一定操作能力

* 準備事項:
    1. 三台Worker電腦當作 Journalnodes及ZooKeeper
    2. 原本的NameNode電腦多新增一個 ResourceManager Stantby
    3. 原本的ResourceManager電腦多新增一個 NameNode Stantby

  1. 請先確認叢集 hdfs 及 yarn 均已停止服務 !!

  2. 新增hdfs-site.xml檔,並SCP到其他台電腦(hadoop身份)

  • SCP使用方式請參考下方範例 …
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
 <name>dfs.nameservices</name>
 <value>nncluster</value>
</property>
<property>
 <name>dfs.ha.namenodes.nncluster</name>
 <value>nn1,nn2</value>
</property>
<property>
 <name>dfs.namenode.rpc-address.nncluster.nn1</name>
 <value>test30.example.org:8020</value>
</property>
<property>
 <name>dfs.namenode.http-address.nncluster.nn1</name>
 <value>test30.example.org:9870</value>
</property>
<property>
 <name>dfs.namenode.rpc-address.nncluster.nn2</name>
 <value>test31.example.org:8020</value>
</property>
<property>
 <name>dfs.namenode.http-address.nncluster.nn2</name>
 <value>test31.example.org:9870</value>
</property>
<property>
 <name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://test32.example.org:8485;test33.example.org:8485;test34.example.org:8485/nncluster</value>
</property>
<property>
 <name>dfs.journalnode.edits.dir</name>
 <value>/home/hadoop/journalnode</value>
</property>
<property>
 <name>dfs.ha.fencing.methods</name>
 <value>shell(/bin/true)</value>
</property>
  <property>
 <name>dfs.client.failover.proxy.provider.nncluster</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
  • SCP指令範例
scp /usr/local/hadoop/etc/hadoop/hdfs-site.xml hadoop@test31:/usr/local/hadoop/etc/hadoop

  1. 更正core-site.xml檔,並SCP到其他台電腦(hadoop身份)
nano /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://nncluster</value>
</property>
  1. 三台電腦(Journalnodes)建立journalnode資料夾(hadoop身份)
mkdir ~/journalnode
  1. 啟動journalnode,並jps確認(hadoop身份)
hdfs --daemon start journalnode

  1. active NameNode限定(hadoop身份)
hdfs namenode -initializeSharedEdits

  • 請確認有出現Sucessfully started new epoch 1
  • 如果此叢集是全新未使用過的請先format一下!!!!!
hdfs namenode -format
  1. 啟動第一台NameNode(hadoop身份)
hdfs --daemon start namenode

  1. 第二台NameNode複製metadata(hadoop身份)
hdfs namenode -bootstrapStandby

  • 請確認有出現has been successfully formatted
  1. 啟動第二台NameNode(hadoop身份)
hdfs --daemon start namenode

  1. 停止全部NameNode再啟動(hadoop身份)
stop-dfs.sh
start-dfs.sh

  • 兩台namenode及三台journal node均會一起停止及啟動
  1. 激活第一台NameNode,並檢查狀態(hadoop身份)
hdfs haadmin -transitionToActive nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

  1. 啟動Yarn(hadoop身份)
start-yarn.sh

  1. 啟動Job history server(hadoop身份)
mapred --daemon start historyserver

  1. 切換一下active Namenode(hadoop身份)
hdfs haadmin -transitionToStandby nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
hdfs haadmin -transitionToActive nn2
hdfs haadmin -getServiceState nn2

  1. 跑個PI測試一下新起Namenode能不能正常運作(hadoop身份)
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 30 100

  1. 下載ZooKeeper並安裝(三台Zookeeper電腦都要)(管理者身份)

    1. 下載ZooKeeper
    wget http://ftp.tc.edu.tw/pub/Apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
    
    1. 解壓縮
    tar -xvf apache-zookeeper-3.5.6-bin.tar.gz -C /usr/local
    
    1. 更名
    mv /usr/local/apache-zookeeper-3.5.6-bin /usr/local/zookeeper
    
    1. 修改擁有者
    chown -R hadoop:hadoop /usr/local/zookeeper
    
  2. 複製zoo_sample.cfg並編輯zoo.cfg(可以SCP到另外兩台)(hadoop身份)

cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
nano /usr/local/zookeeper/conf/zoo.cfg

dataDir=/usr/local/zookeeper/zoodata #修改
admin.serverPort=8010 #新增
server.1=test32.example.org:2888:3888 #新增
server.2=test33.example.org:2888:3888 #新增
server.3=test34.example.org:2888:3888 #新增
  1. 修改zkEnv.sh檔(可以SCP到另外兩台)(hadoop身份)
nano /usr/local/zookeeper/bin/zkEnv.sh

#新增
ZOO_LOG_DIR="/usr/local/zookeeper/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"  
  1. 建立存放LOG資料夾(hadoop身份)
mkdir /usr/local/zookeeper/zoodata
echo "1" > /usr/local/zookeeper/zoodata/myid #第一台zookeeper做
echo "2" > /usr/local/zookeeper/zoodata/myid #第二台zookeeper做
echo "3" > /usr/local/zookeeper/zoodata/myid #第三台zookeeper做
  • myid請務必要與zoo.cfg設定一樣

  1. 修改環境變數(hadoop身份)

    1. 編輯.bashrc
    nano ~/.bashrc
    
    1. 新增環境變數
    export ZOOKEEPER_HOME=/usr/local/zookeeper
    export PATH=$PATH:$ZOOKEEPER_HOME/bin  
    

    1. 載入環境變數
    source ~/.bashrc # . ~/.bashrc
    
  2. 啟動ZooKeeper(三台電腦均要啟動)

zkServer.sh start
zkServer.sh status
jps

  • 只有一台啟動時候,查看狀態會說It is probably not running.代表目前沒有其他zookeeper溝通
  1. 依序停止下列服務
#停止Historyserver 
mapred --daemon stop historyserver
#停止ResoureManager
stop-yarn.sh 
#停止NameNode
stop-dfs.sh
  1. 新增hdfs-site.xml,並SCP到其他台電腦(hadoop身份)
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<!--新增 -->
<property>
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
</property>
  1. 新增core-site.xml,並SCP到其他台電腦(hadoop身份)
nano /usr/local/hadoop/etc/hadoop/core-site.xml

<!--新增 -->
<property>
 <name>ha.zookeeper.quorum</name>
 <value>master1.example.org:2181,master2.example.org:2181,master3.example.org:2181</value>
</property>
  1. NameNode限定(hadoop身份)
hdfs zkfc -formatZK

  • 請確認出現Successfully created /hadoop-ha/nncluster in ZK字樣
  1. 啟動NameNode(NameNode限定)(hadoop身份)
start-dfs.sh

  • 將會自動啟動DFSZKFailoverController服務
  1. 測試NameNode故障自動轉移(NameNode限定)(hadoop身份)
hdfs --daemon stop namenode
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

  1. 新增及刪除yarn-site.xml,並SCP到其他台電腦(hadoop身份)

<!--刪除property -->

<property>
<name>yarn.resourcemanager.hostname</name>
<value>test31.example.org</value>
</property>

<!--新增property -->

<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>rmcluster</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>test31.example.org</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>test30.example.org</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>test31.example.org:8088</value>
</property>
<property>
 <name>yarn.resourcemanager.webapp.address.rm2</name>
 <value>test30.example.org:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.store.class</name>	         
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>test32.example.org:2181,test33.example.org:2181,test34.example.org:2181</value>
</property>
  1. 依序啟動下列服務(hadoop身份)
#啟動ResoureManager
start-yarn.sh 
#啟動Historyserver 
mapred --daemon start historyserver
  1. 測試ResourceManager故障自動轉移(Resourcemanager限定)(hadoop身份)
yarn --daemon stop resourcemanager
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2

  1. 如果有修改Spark-defaults.conf運行程式載入Jar檔,請記得修訂
nano /usr/local/spark/conf/spark-defaults.conf

恭喜你完成最後一階段 高可用性HA(high availability)架設~~~


如果覺得內容還不錯,請我喝杯咖啡吧~

Similar Posts