配了个haHadoo集群,手动kill -9了1号机的namenode,发现2号不能自动变为active,查看日志报错:
写道
2018-10-31 14:11:02,098 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2018-10-31 14:11:02,098 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to create SSH session
com.jcraft.jsch.JSchException: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
at com.jcraft.jsch.KeyPair.load(KeyPair.java:543)
at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:40)
at com.jcraft.jsch.JSch.addIdentity(JSch.java:407)
at com.jcraft.jsch.JSch.addIdentity(JSch.java:367)
at org.apache.hadoop.ha.SshFenceByTcpPort.createSession(SshFenceByTcpPort.java:122)
at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:91)
at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:532)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at com.jcraft.jsch.Util.fromFile(Util.java:508)
at com.jcraft.jsch.KeyPair.load(KeyPair.java:540)
... 15 more
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2018-10-31 14:11:02,099 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop1/192.168.150.151:9000
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2018-10-31 14:11:02,099 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2018-10-31 14:11:02,102 INFO org.apache.zookeeper.ZooKeeper: Session: 0x166c8a424df00fb closed
2018-10-31 14:11:03,102 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@14671dfe
2018-10-31 14:11:03,103 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop2/192.168.150.152:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-31 14:11:03,104 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop2/192.168.150.152:2181, initiating session
2018-10-31 14:11:03,106 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop2/192.168.150.152:2181, sessionid = 0x266c8a421c9010f, negotiated timeout = 5000
2018-10-31 14:11:03,106 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2018-10-31 14:11:03,107 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2018-10-31 14:11:03,107 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2018-10-31 14:11:03,108 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a086d796861646f6f7012036e6e311a076861646f6f703120a84628d33e
2018-10-31 14:11:03,108 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop1/192.168.150.151:9000
2018-10-31 14:11:02,098 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to create SSH session
com.jcraft.jsch.JSchException: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
at com.jcraft.jsch.KeyPair.load(KeyPair.java:543)
at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:40)
at com.jcraft.jsch.JSch.addIdentity(JSch.java:407)
at com.jcraft.jsch.JSch.addIdentity(JSch.java:367)
at org.apache.hadoop.ha.SshFenceByTcpPort.createSession(SshFenceByTcpPort.java:122)
at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:91)
at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:532)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at com.jcraft.jsch.Util.fromFile(Util.java:508)
at com.jcraft.jsch.KeyPair.load(KeyPair.java:540)
... 15 more
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2018-10-31 14:11:02,099 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2018-10-31 14:11:02,099 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop1/192.168.150.151:9000
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2018-10-31 14:11:02,099 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2018-10-31 14:11:02,102 INFO org.apache.zookeeper.ZooKeeper: Session: 0x166c8a424df00fb closed
2018-10-31 14:11:03,102 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@14671dfe
2018-10-31 14:11:03,103 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop2/192.168.150.152:2181. Will not attempt to authenticate using SASL (unknown error)
2018-10-31 14:11:03,104 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop2/192.168.150.152:2181, initiating session
2018-10-31 14:11:03,106 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop2/192.168.150.152:2181, sessionid = 0x266c8a421c9010f, negotiated timeout = 5000
2018-10-31 14:11:03,106 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2018-10-31 14:11:03,107 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2018-10-31 14:11:03,107 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2018-10-31 14:11:03,108 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a086d796861646f6f7012036e6e311a076861646f6f703120a84628d33e
2018-10-31 14:11:03,108 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop1/192.168.150.151:9000
主要是这个
写道
com.jcraft.jsch.JSchException: java.io.FileNotFoundException: /home/root/.ssh/id_rsa (No such file or directory)
测试ssh几台机器都可以相互免密ssh登录,后来发现hdfs-site.xml 中的sshfence配置是用来通过 ssh 登录到前一个 active NameNode上将其补刀杀死用的,以便于确定只有一个 active NameNode,dfs.ha.fencing.ssh.private-key-files是配置本机私钥文件的存放地址,我的私钥地址配置错了,所以无法补刀,所以备用的namenode不能确定只有它活着,所以不敢转为active状态。
我的秘钥放在/root/.ssh/id_rsa,而之前配置成了 /home/root/.ssh/id_rsa
相关推荐
规划Hadoop HA 集群 Hadoop HA 集群部署模式 什么是 HA ? HA是High Availability的简写,即高可用,指当前工作中的机器宕机后,会自动处理这个异常,并将工作无缝地转移到其他备用机器上去,以来保证服务的高可用。...
搭建高可用的Hadoop集群,基于NFS共享磁盘的namenode配置,使用zookeeper进行主节点推举
HDFS HA 配置、启动与...任务四 HDFS HA集群的NameNode格式化(一);任务四 HDFS HA集群的NameNode格式化(二);任务五 部署完成之后常规启动HDFS HA集群;任务六 验证HDFS HA集群(一);任务六 验证HDFS HA集群(二)
Hadoop集群安装的详细说明文档, 實作七: Hadoop 叢集安裝 前言 您手邊有兩台電腦,假設剛剛操作的電腦為"主機一" ,另一台則為"主機二" 。則稍後的環境如下 • 管理Data的身份 管理Job的身份 "主機一" namenode ...
HadoopHA集群搭建描述及指令,里面有各种注意事项。 集群部署节点角色的规划(3节点) ------------------ server01 namenode resourcemanager zkfc nodemanager datanode zookeeper journal node server02 ...
Hadoop Namenode性能诊断及优化
在网上搜集的以及本人自己总结的hadoop集群常见问题及解决办法,融合了网上常常搜到的一些文档以及个人自己的经验。
通常配置奇数个JournalNode,这里还配置了一个Zookeeper集群,用于ZKFC故障转移,当Active NameNode挂掉了,会自动切换Standby NameNode为Active状态。 YARN的ResourceManager也存在单点故障问题,这个问题在hadoop-...
hadoop NameNode 源码解析
要想深入的学习Hadoop数据分析技术,首要的任务是必须要将hadoop集群环境搭建起来,可以将hadoop简化地想象成一个小软件,通过在各个物理节点上安装这个小软件,然后将其运行起来,就是一个hadoop分布式集群了。...
Hadoop2.2.0版本 - 虚拟机VMWare - Linux(ubuntu) ,多节点伪...3、这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当Active NameNode挂掉了,会自动切换Standby NameNode为active状态。
所谓HA,即高可用,实现高可用最关键的是消除单点故障,hadoop-ha严格来说应该分成各个组件的HA机制——HDFS的HA、YARN的HA;通过双namenode消除单点故障;通过双namenode协调工作
规划Hadoop大数据平台集群 Hadoop集群的三种模式 单机模式 在单机上运行。 没有分布式文件系统,直接读写本地操作系统。 伪分布模式 在单机上运行。 使用分布式文件系统。 hadoop集群只有一个节点,因此hdfs的块复制...
Hadoop Namenode恢复
今天小编就为大家分享一篇关于Hadoop之NameNode Federation图文详解,小编觉得内容挺不错的,现在分享给大家,具有很好的参考价值,需要的朋友一起跟随小编来看看吧
错误:启动完后,namenode的主机上没有出现namenode进程。 原因:没有进行namenode初始化。只初始化一次,以后不要初始化了。 错误:启动完后,datanode的主机上没有出现datanode进程。或者出现了多个datanode进行...
很详细的配置文档,比网上要详细, 主要介绍hadoop集群配置, 包含namenode,datanode配置 基于ubuntu linux系统
相信对于大部分的大数据初学者来说,一定遇见过hadoop集群无法正常关闭的情况。有时候当我们更改了hadoop内组件的配置文件后,必须要通过重启集群来使配置文件生效。 但往往一stop-all.sh,集群下方总会出现下面的...
今天小编就为大家分享一篇关于Hadoop中namenode和secondarynamenode工作机制讲解,小编觉得内容挺不错的,现在分享给大家,具有很好的参考价值,需要的朋友一起跟随小编来看看吧
1. 启动3个Zookeeper 2. 启动3个JournalNode 3. 格式化NameNode 4. 复制hadoop01上的NameNode的元数据到h