Hadoop 分布式集群部署

网友投稿 283 2022-11-20

Hadoop 分布式集群部署

一、集群环境说明

1、主机规划

IP地址 角色 HDFS YARN
10.6.2.237 master NameNode、DataNode NodeManager
10.6.2.239 slave DataNode、SecondaryNameNode NodeManager
10.6.2.241 slave DataNode ResourceManager、NodeManager

2、规划说明

关于分布式集群的部署实验,最少服务器数量是三台 广义上说的 Hadoop 集群 ,其实是包括 Hadoop 和 yarn 两个集群,下面多次提到 Hadoop集群,有些是说的大部分是指广义上的 Hadoop 集群,要知道区分 关于上面表格里规划主机的各个角色,有些是可以进行调整的,有些甚至是可以不需要的,当然大数据的所有角色,这里也不可能全部都配置上,所有这些的关键,取决于部署的环境要求,或者说你要部署集群的需求 一般 HDFS 的 MASTER节点,只会存放元数据,不存放业务数据,所以,集群节点的规划中 ,服务器 10.6.2.237 不一定要配置为 DataNode 节点 ;同样的道理 10.6.2.239 也不一定要配置成为 SecondaryNameNode节点,完全可以把这个角色配置到 10.6.2.237 或者 10.6.2.241 上,当然也可以不配置这个角色 同样的道理,关于 YARN 集群中,ResourceManager 这个角色也不一定要配置在 10.6.2.241 这个节点上,完全可以配置在 10.6.2.237 或者 10.6.2.239 上,但是这个角色是必须配置的,是 YARN 集群中的一个很重要的角色 上面的规划,列出的是 Hadoop 和 Yarn 集群中的角色,注册服务 Zooekeeper 集群这里没有列出来,三个节点都必须部署,要部署成为一个 Zookeeper 的集群 上面之所以这么规划,是想着 "均衡"下资源,每个服务上都是三个服务的进程中运行,真实环境中不可能是这样的,一般的 Master 节点不会用存放业务数据的,所以在具体环境里面得看具体的配置而定

二、安装前准备

1、安装步骤说明

其实安装很简单,主要就 三大步:

第一大步:安装前环境准备 第二大步:安装 Zookeeper 集群 第三大步:安装 Hadoop 集群

2、系统初始化相关准备

三个节点都有进行相关操作

1)安装基本的系统命令包

[root@hadoop01 ~]# yum -y install epel-release [root@hadoop01 ~]# yum -y install net-tools gcc gcc-c++ lrzsz vim wget curl git zip unzip ntp telnet

​ 2)关闭 SELINUX 、防火墙

[root@hadoop01 ~]# setenforce 0 [root@hadoop01 ~]# sed -i 's/enforcing/disabled/' /etc/selinux/config [root@hadoop01 ~]# systemctl stop firewalld && systemctl disable firewalld

​ 3)配置主机名、添加解析

[root@hadoop01 ~]# hostnamectl set-hostname hadoop01 [root@hadoop02 ~]# hostnamectl set-hostname hadoop02 [root@hadoop03 ~]# hostnamectl set-hostname hadoop03 [root@hadoop01 ~]# cat << EOF >> /etc/hosts > 10.6.2.237 hadoop01 > 10.6.2.239 hadoop02 > 10.6.2.241 hadoop03 > EOF ### 同样的操作在 hadoop02 ,hadoop03 上执行一次

​ 4)配置时间同步

[root@hadoop01 ~]# \cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime [root@hadoop01 ~]# ntpdate ntp.aliyun.com [root@hadoop01 ~]# systemctl start ntpdate && systemctl enable ntpdate

3、安装配置 JDK

​ 1)安装 JDK

[root@hadoop01 ~]# tar -zxvf jdk-8u131-linux-x64.tar.gz -C /usr/local/ [root@hadoop01 ~]# cd /usr/local/jdk1.8.0_131/bin [root@hadoop01 ~]# ./java -version java version "1.8.0_131" Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

​ 2)配置环境变量

[root@hadoop01 ~]# cp /etc/profile /etc/profile.bak [root@hadoop01 ~]# vim /etc/profile # 文本末尾追加下面两行 export JAVA_HOME=/usr/local/jdk1.8.0_131 export PATH=.:$PATH:$JAVA_HOME/bin [root@hadoop01 ~]# source /etc/profile [root@hadoop01 ~]# java -version java version "1.8.0_131" Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

4、配置 SSH 互信

​ 1)配置 SSH 免密登录

[root@hadoop01 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa [root@hadoop01 ~]# ssh-copy-id hadoop01 [root@hadoop01 ~]# ssh-copy-id hadoop02 [root@hadoop01 ~]# ssh-copy-id hadoop03 [root@hadoop02 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa [root@hadoop02 ~]# ssh-copy-id hadoop01 [root@hadoop02 ~]# ssh-copy-id hadoop02 [root@hadoop02 ~]# ssh-copy-id hadoop03 [root@hadoop03 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa [root@hadoop03 ~]# ssh-copy-id hadoop01 [root@hadoop03 ~]# ssh-copy-id hadoop02 [root@hadoop03 ~]# ssh-copy-id hadoop03

​ 2)测试下免密登录

[root@hadoop01 ~]# ssh hadoop01 [root@hadoop01 ~]# ssh hadoop02 [root@hadoop01 ~]# ssh hadoop03 [root@hadoop02 ~]# ssh hadoop01 [root@hadoop02 ~]# ssh hadoop02 [root@hadoop02 ~]# ssh hadoop03 [root@hadoop03 ~]# ssh hadoop01 [root@hadoop03 ~]# ssh hadoop02 [root@hadoop03 ~]# ssh hadoop03

5、安装 Zookeeper 集群

​ 1)安装集群

[root@hadoop01 ~]# tar -zxvf zookeeper-3.4.14.tar.gz -C /usr/local/ [root@hadoop01 ~]# mv /usr/local/zookeeper-3.4.14 /usr/local/zookeeper [root@hadoop01 ~]# cd /usr/local/zookeeper/conf [root@hadoop01 ~]# cp zoo_sample.cfg zoo.cfg [root@hadoop01 ~]# vim zoo.cfg 。。。。。。 dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/logs server.1=hadoop01:2888:3888 server.2=hadoop02:2888:3888 server.3=hadoop03:2888:3888 [root@hadoop01 ~]# scp -r /usr/local/zookeeper hadoop02:/usr/local [root@hadoop01 ~]# scp -r /usr/local/zookeeper hadoop03:/usr/local [root@hadoop01 ~]# mkdir -p /data/zookeeper/{data,logs} [root@hadoop01 ~]# scp -r /data hadoop02:/ [root@hadoop01 ~]# scp -r /data hadoop03:/ [root@hadoop01 ~]# echo "1" > /data/zookeeper/data/myid [root@hadoop02 ~]# echo "2" > /data/zookeeper/data/myid [root@hadoop03 ~]# echo "3" > /data/zookeeper/data/myid

​ 2) 启动集群

[root@hadoop01 ~]# cd /usr/local/zookeeper/bin/ [root@hadoop01 ~]# zkServer.sh start [root@hadoop02 ~]# cd /usr/local/zookeeper/bin/ [root@hadoop02 ~]# zkServer.sh start [root@hadoop03 ~]# cd /usr/local/zookeeper/bin/ [root@hadoop03 ~]# zkServer.sh start

​ 3)验证集群启动情况

[root@hadoop01 ~]# cd /usr/local/zookeeper/bin/ [root@hadoop01 bin]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: follower [root@hadoop02 ~]# cd /usr/local/zookeeper/bin/ [root@hadoop02 bin]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: follower [root@hadoop03 ~]# cd /usr/local/zookeeper/bin/ [root@hadoop03 bin]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: leader

​ 4)配置环境变量

[root@hadoop01 ~]# cat << EOF >> /etc/profile > export ZOOKEEPER_HOME=/usr/local/zookeeper > export PATH=.:$PATH:$ZOOKEEPER_HOME/bin > EOF [root@hadoop01 ~]# source /etc/profile [root@hadoop01 ~]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: follower ### 同样的操作在 hadoop02 ,hadoop03 上执行一次

三、安装部署

1、上传并解压缩安装包

[root@hadoop01 ~]# tar -zxvf hadoop-2.7.3.tar.gz -C /usr/local/ [root@hadoop01 ~]# mv hadoop-2.7.3 hadoop [root@hadoop01 ~]# cd /usr/local/hadoop/etc/hadoop/

2、修改相关配置文件

[root@hadoop01 hadoop]# cp hadoop-env.sh hadoop-env.sh.default [root@hadoop01 hadoop]# vim hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.8.0_131

[root@hadoop01 hadoop]# cp yarn-env.sh yarn-env.sh.default [root@hadoop01 hadoop]# vim yarn-env.sh export JAVA_HOME=/usr/local/jdk1.8.0_131

[root@hadoop01 hadoop]# cp hdfs-site.xml hdfs-site.xml.default [root@hadoop01 hadoop]# vim hdfs-site.xml dfs.namenode.name.dir /data/hadoop/namenode dfs.datanode.data.dir /data/hadoop/datanode dfs.namenode.http-address hadoop01:50070 dfs.namenode.secondary.http-address hadoop02:50090 dfs.replication 2 dfs.permissions.enabled false dfs.webhdfs.enabled true

[root@hadoop01 hadoop]# cp yarn-site.xml vim yarn-site.xml.default [root@hadoop01 hadoop]# vim yarn-site.xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address hadoop03:8025 yarn.resourcemanager.admin.address hadoop03:8030 yarn.resourcemanager.scheduler.address hadoop03:8050 yarn.resourcemanager.resource-tracker.address hadoop03:8033 yarn.resourcemanager.webapp.address hadoop03:8088

[root@hadoop01 hadoop]# cp core-site.xml core-site.xml.default [root@hadoop01 hadoop]# vim core-site.xml     fs.defaultFS      hdfs://hadoop01:9000   hadoop.tmp.dir   /data/hadoop/tmpdata io.file.buffer.size 10240 ha.zookeeper.quorum hadoop01:2181,hadoop02:2181,hadoop03:2181 ha.zookeeper.session-timeout.ms 1000 hadoop.proxyuser.hadoop.hosts * hadoop.proxyuser.hadoop.groups *

[root@hadoop01 hadoop]# cp mapred-queues.xml.template mapred-site.xml [root@hadoop01 hadoop]# vim mapred-site.xml mapreduce.framework.name yarn

[root@hadoop01 hadoop]# vim slaves slaves.default [root@hadoop01 hadoop]# vim slaves hadoop01 hadoop02 hadoop03

3、创建相关的数据目录

[root@hadoop01 ~]# cd /data # 定义的数据存放路径 (ZK也是这个路径) [root@hadoop01 data]# mkdir -p hadoop/{namenode,datanode,tmpdata}

4、传送 HADOOP 目录到 SLAVE 节点

[root@hadoop01 ~]# scp -r /usr/local/hadoop hadoop02:/usr/local [root@hadoop01 ~]# scp -r /usr/local/hadoop hadoop03:/usr/local [root@hadoop01 ~]# scp -r /data/hadoop hadoop02:/data [root@hadoop01 ~]# scp -r /data/hadoop hadoop03:/data

5、配置 HADOOP 环境变量

[root@hadoop01 ~]# cat << EOF >> /etc/profile > export HADOOP_HOME=/usr/local/hadoop > export PATH=.:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin > EOF [root@hadoop01 ~]# source /etc/profile ### 所有节点都要配置,相同的操作在其他两个节点执行一遍

四、初始化并启动集群

1、初始化集群 ( 在定义的 Master 节点执行 )

[root@hadoop01 ~]# hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. ...... 21/06/02 08:06:47 INFO common.Storage: Storage directory /data/hadoop/namenode has been successfully formatted. 21/06/02 08:06:47 INFO namenode.FSImageFormatProtobuf: Saving image file /data/hadoop/namenode/current/fsimage.ckpt_0000000000000000000 using no compression 21/06/02 08:06:47 INFO namenode.FSImageFormatProtobuf: Image file /data/hadoop/namenode/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds. 21/06/02 08:06:47 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 21/06/02 08:06:47 INFO util.ExitUtil: Exiting with status 0 21/06/02 08:06:47 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop01/10.6.2.237 ************************************************************/

2、启动集群

1、启动 HADOOP 集群 # ( hadoop01上操作 ) [root@hadoop01 ~]# start-dfs.sh Starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop01.out hadoop01: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop01.out hadoop03: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop03.out hadoop02: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop02.out Starting secondary namenodes [hadoop02] hadoop02: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop02.out 2、启动 YARN 集群 # ( hadoop03 操作 ) [root@hadoop03 ~]# start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop03.out hadoop01: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop01.out hadoop03: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop03.out hadoop02: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop02.out 3、相关命令的说明 1)启动和停止命令 start-dfs.sh 、stop-dfs.sh 启动和停止所有HDFS集群相关的守护进程 start-yarn.sh 、stop-yarn.sh 启动和停止所有YARN集群相关的守护进程 2)替代命令 start-all.sh 、stop-all.sh 命令的效果等同于上面两个启动和停止命令 3)当个进程启动命令 ( 较少使用 ) hadoop-daemon.sh start/stop XXX 、yarn-daemon.sh start/stop XXX 4)二进制命令文件目录 不同于一般的服务,安装后会有一个 bin 目录,里面有相关服务进程的二进制执行文件,Hadoop部署后,会生成 bin 、sbin 两个执行文件存放的目录,关于集群启动和停止相关的命令在 bin 目录下,sbin 目录下是 Hadoop shell相关的命令。路径分别为: $HADOOP_HOME/bin 、$HADOOP_HOME/sbin 在对Hadoop做环境变量配置的过程中,就要添加相关的两个路径(如上节里面配所示),配置好了环境变量,后面就可以不需要关心具体路径的位置了

3、查看集群相关进程

[root@hadoop01 ~]# jps 18720 DataNode 19089 Jps 18346 QuorumPeerMain 18587 NameNode 18892 NodeManager [root@hadoop02 ~]# jps 15121 DataNode 15410 Jps 15219 SecondaryNameNode 15063 QuorumPeerMain 15272 NodeManager [root@hadoop03 ~]# jps 17696 DataNode 18219 Jps 17808 ResourceManager 17625 QuorumPeerMain 17906 NodeManager

五、浏览器访问验证

1、访问HDFS集群 ( 、 )

2、访问 YARN 集群 ( )

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Java&nbsp;数据库连接池c3p0&nbsp;介绍
下一篇:数据分析师的职业发展
相关文章

 发表评论

暂时没有评论,来抢沙发吧~