c语言sscanf函数的用法是什么
294
2022-11-19
Hadoop开发环境搭建(Ubuntu)
Hadoop 运行环境搭建
来源于一个朋友的博客,这是在Ubuntu上搭建的,仅作参考。
1. 虚拟机环境准备
主机名 | IP | 内存 | 系统 |
---|---|---|---|
hadoop101 | 192.168.10.11 | 2G | Ubuntu 20.04.3 LTS |
hadoop102 | 192.168.10.12 | 2G | Ubuntu 20.04.3 LTS |
hadoop103 | 192.168.10.13 | 2G | Ubuntu 20.04.3 LTS |
以下步骤三台服务器均需要操作
1.1 添加主机名映射
BASH ren@hadoop101:~$ cat /etc/hosts 192.168.10.11 hadoop101 192.168.10.12 hadoop102 192.168.10.13 hadoop103
1.2 ssh 三机互信
BASH ren@hadoop101:~$ ssh-keygen ren@hadoop101:~$ ssh-copy-id ren@192.168.10.11 ren@hadoop101:~$ ssh-copy-id ren@192.168.10.12 ren@hadoop101:~$ ssh-copy-id ren@192.168.10.13
1.3 创建项目目录
BASH ren@hadoop101:~$ sudo mkdir /opt/{module,software} ren@hadoop101:~$ sudo chown -R ren.ren /opt/*
2. Hadoop 部署
2.1 集群部署规划
服务 | hadoop101 | hadoop102 | hadoop103 |
---|---|---|---|
HDFS | NameNode DatanNode |
DataNode | SecondaryNameNode DataNode |
YARN | NodeManager | ResourceManager NodeManager |
NodeManager |
2.2 常用端口说明
端口名称 | Hadoop2.x | Hadoop3.x |
---|---|---|
NameNode 内部通信端口 | 8020/9000 | 8020 /9000/9820 |
NameNode HTTP UI | 50070 |
9870 |
MapReduce 查看执行任务端口 | 8088 | 8088 |
历史服务器通信端口 | 19888 | 19888 |
2.3 资源下载
这里通过某度下载链接: /s/1VrFIfFpxq4S0nrrMGcKw3w 提取码: 986o
2.4 上传资源到 software 目录
BASH ren@hadoop101:~$ ll /opt/software/ total 767908 drwxr-xr-x 2 ren ren 4096 Mar 2 09:39 ./ drwxr-xr-x 4 root root 4096 Mar 1 18:56 ../ -rw-r--r-- 1 ren ren 607792249 Mar 1 18:57 hadoop-3.3.1-aarch64.tar.gz -rw-r--r-- 1 ren ren 115 Mar 1 18:58 hadoop.sh -rw-r--r-- 1 ren ren 178509312 Mar 1 18:58 jdk-8u321-fcs-bin-b07-linux-aarch64-15_dec_2021.tar -rw-r--r-- 1 ren ren 75 Mar 1 18:58 jdk.sh -rwxrwxr-x 1 ren ren 157 Mar 2 09:39 jpsall* -rwxrwxr-x 1 ren ren 1086 Mar 2 09:30 myhadoop* -rwxrwxr-x 1 ren ren 691 Mar 1 19:00 xsync*
2.5 安装 jdk
解压 jdk 到 module 目录
BASH ren@hadoop101:/opt/software$ tar xf jdk-8u321-fcs-bin-b07-linux-aarch64-15_dec_2021.tar -C ../module/
添加 jdk 系统环境变量
BASH ren@hadoop101:/opt/software$ sudo cp jdk.sh /etc/profile.d/ ren@hadoop101:/opt/software$ source /etc/profile.d/jdk.sh
确保 java 有以下输出
BASH ren@hadoop101:/opt/software$ java -version java version "1.8.0_321" Java(TM) SE Runtime Environment (build 1.8.0_321-b07) Java HotSpot(TM) 64-Bit Server VM (build 25.321-b07, mixed mode)
2.6 安装 hadoop
解压 hadoop 到 module 目录
BASH ren@hadoop101:/opt/software$ tar xf hadoop-3.3.1-aarch64.tar.gz -C ../module/
添加 hadoop 系统环境变量
BASH ren@hadoop101:/opt/software$ sudo cp hadoop.sh /etc/profile.d/ ren@hadoop101:/opt/software$ source /etc/profile.d/hadoop.sh
测试是否安装成功
BASH ren@hadoop101:/opt/module$ hadoop version Hadoop 3.3.1 Source code repository -r a3b9c37a397ad4188041dd80621bdeefc46885f2 Compiled by ubuntu on 2021-06-15T10:51Z Compiled with protoc 3.7.1 From source with checksum 88a4ddb2299aca054416d6b7f81ca55 This command was run using /opt/module/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar
2.7 配置 hadoop
2.7.1 默认配置文件
BASH ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml ./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml ./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
2.7.2 自定义配置文件
etc/hadoop/core-site.xml
XML
etc/hadoop/hdfs-site.xml
XML
etc/hadoop/yarn-site.xml
XML
etc/hadoop/mapred-site.xml
XML
2.7.3 配置 workers
etc/hadoop/workers
BASH hadoop101 hadoop102 hadoop103
3. 项目同步
同步 hadoop101 的 hadoop 项目目录到 hadoop102、haddop103使用自定义脚本 xsync
BASH #!/bin/bash #1. 判断参数个数 if [ $# -lt 1 ]; then echo Not Enough Arguement! exit; fi #2. 遍历集群所有机器 for host in hadoop101 hadoop102 hadoop103; do echo ==================== $host ==================== #3. 遍历所有目录,挨个发送 for file in $@; do #4. 判断文件是否存在 if [ -e $file ]; then #5. 获取父目录 pdir=$(cd -P $(dirname $file); pwd) #6. 获取当前文件的名称 fname=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $pdir/$fname $host:$pdir else echo $file does not exists! fi done done
3.1 同步目录
BASH ren@hadoop101:/opt/module$ xsync /opt/module/
4. 起集群
4.1 如果集群第一次启动,需要再 hadoop101 节点格式化 NameNode
BASH hdfs namenode -format
4.2 启动集群
这里使用自定义脚本 myhadoopUsage: myhadoop
BASH #!/bin/bash if [ $# -lt 1 ]; then echo "No Args Input..." exit ; fi case $1 in "start") echo " =================== 启动 hadoop 集群 ===================" echo " --------------- 启动 hdfs ---------------" ssh hadoop101 "/opt/module/hadoop-3.3.1/sbin/start-dfs.sh" echo " --------------- 启动 yarn ---------------" ssh hadoop102 "/opt/module/hadoop-3.3.1/sbin/start-yarn.sh" echo " --------------- 启动 historyserver ---------------" ssh hadoop101 "/opt/module/hadoop-3.3.1/bin/mapred --daemon start historyserver" ;; "stop") echo " =================== 关闭 hadoop 集群 ===================" echo " --------------- 关闭 historyserver ---------------" ssh hadoop101 "/opt/module/hadoop-3.3.1/bin/mapred --daemon stop historyserver" echo " --------------- 关闭 yarn ---------------" ssh hadoop102 "/opt/module/hadoop-3.3.1/sbin/stop-yarn.sh" echo " --------------- 关闭 hdfs ---------------" ssh hadoop101 "/opt/module/hadoop-3.3.1/sbin/stop-dfs.sh" ;; *) echo "Input Args Error..." ;; esac BASH ren@hadoop101:/opt/module/hadoop-3.3.1$ myhadoop start
4.3 查看集群状态
这里使用自定义脚本 jpsallUsage: jpsall
BASH #!/bin/bash for host in hadoop101 hadoop102 hadoop103; do echo =============== $host =============== ssh $host /opt/module/jdk1.8.0_321/bin/jps done
查看集群进程确保进程都在,这里配置的是NameNode 和 JobHistoryServer 在 hadoop101ResourceManager 在 hadoop102SecoundaryNameNode 在 hadoop103
BASH ren@hadoop101:/opt/module/hadoop-3.3.1$ jpsall =============== hadoop101 =============== 17907 JobHistoryServer 17172 NameNode 17383 DataNode 21559 Jps 17689 NodeManager =============== hadoop102 =============== 19267 DataNode 19495 ResourceManager 19851 NodeManager 22975 Jps =============== hadoop103 =============== 16487 NodeManager 16167 DataNode 20984 Jps 16333 SecondaryNameNode
4.4 集群 web 页面
hdfs web 页面集群文件的 web 管理页面web 页面yarn 的资源调度页面web 页面各服务组件逐一启动 / 停止
如果单台机器上服务出现异常,单独启动或停止方法
BASH hdfs --daemon start/stop namenode/datanode/secondarynamenode
启动 / 停止 yarn
BASH yarn --daemon start/stop resourcemanager/nodemanager
5. 集群饿基本测试
5.1 上传文件到集群
hdfs 中创建目录
BASH ren@hadoop101:/opt/module/hadoop-3.3.1$ hadoop fs -mkdir /input
HDFS 中上传一个小文件
BASH hadoop fs -put /opt/module/hadoop-3.3.1/wcinput/word.txt /input
数据的实际存储位置
BASH ren@hadoop101:/opt/module/hadoop-3.3.1$ cat data/dfs/data/current/BP-2027323945-192.168.10.11-1646133053594/current/finalized/subdir0/subdir0/blk_1073741825 test hehe haha test za zhe
5.2 执行调度任务保存到 hdfs
BASH ren@hadoop101:/opt/module/hadoop-3.3.1$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /wcountput 2022-03-02 15:44:37,508 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop102/192.168.10.12:8032 2022-03-02 15:44:37,882 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/ren/.staging/job_1646190571735_0001 2022-03-02 15:44:38,221 INFO input.FileInputFormat: Total input files to process : 1 2022-03-02 15:44:38,302 INFO mapreduce.JobSubmitter: number of splits:1 2022-03-02 15:44:38,450 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1646190571735_0001 2022-03-02 15:44:38,450 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2022-03-02 15:44:38,561 INFO conf.Configuration: resource-types.xml not found 2022-03-02 15:44:38,562 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2022-03-02 15:44:38,716 INFO impl.YarnClientImpl: Submitted application application_1646190571735_0001 2022-03-02 15:44:38,741 INFO mapreduce.Job: The url to track the job: http://hadoop102:8088/proxy/application_1646190571735_0001/ 2022-03-02 15:44:38,741 INFO mapreduce.Job: Running job: job_1646190571735_0001 2022-03-02 15:44:43,845 INFO mapreduce.Job: Job job_1646190571735_0001 running in uber mode : false 2022-03-02 15:44:43,851 INFO mapreduce.Job: map 0% reduce 0% 2022-03-02 15:44:46,897 INFO mapreduce.Job: map 100% reduce 0% 2022-03-02 15:44:51,958 INFO mapreduce.Job: map 100% reduce 100% 2022-03-02 15:44:52,996 INFO mapreduce.Job: Job job_1646190571735_0001 completed successfully 2022-03-02 15:44:53,065 INFO mapreduce.Job: Counters: 54 File System Counters FILE: Number of bytes read=58 FILE: Number of bytes written=545001 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=128 HDFS: Number of bytes written=32 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=1323 Total time spent by all reduces in occupied slots (ms)=2244 Total time spent by all map tasks (ms)=1323 Total time spent by all reduce tasks (ms)=2244 Total vcore-milliseconds taken by all map tasks=1323 Total vcore-milliseconds taken by all reduce tasks=2244 Total megabyte-milliseconds taken by all map tasks=1354752 Total megabyte-milliseconds taken by all reduce tasks=2297856 Map-Reduce Framework Map input records=6 Map output records=6 Map output bytes=51 Map output materialized bytes=58 Input split bytes=101 Combine input records=6 Combine output records=5 Reduce input groups=5 Reduce shuffle bytes=58 Reduce input records=5 Reduce output records=5 Spilled Records=10 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=75 CPU time spent (ms)=560 Physical memory (bytes) snapshot=498315264 Virtual memory (bytes) snapshot=4984467456 Total committed heap usage (bytes)=403177472 Peak Map Physical memory (bytes)=287354880 Peak Map Virtual memory (bytes)=2484170752 Peak Reduce Physical memory (bytes)=210960384 Peak Reduce Virtual memory (bytes)=2500296704 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=27 File Output Format Counters Bytes Written=32
5.3 WEB 页面查看执行结果
每次执行计算任务保存到 hdfs 中,这里使用 wordcount 计算文本中文字出现的次数,可以看到多了一个 wcountput
查看计算结果文件
查看 yarn 任务调度页面,查看此次任务执行各参数
通过任务页面跳转到历史记录页面,查看相关执行细节
因为是集群查看日志不方便,web 端做了日志合并任务关联,通过 yarn 任务调度界面进入查看
6. 常见错误及解决方案
6.1 防火墙没关闭、或没启动 YARN
INFO client.RMProxy: Connecting to ResourceManager at hadoop108/192.168.10.108:8032
6.2 主机名配置错误
主机名不要和集群名称一样,或者删除 /etc/hosts 下 127.0.0.1 hadoop101
6.3 IP 地址配置错误
略
6.4 ssh 没配置好
涉及同步脚本报错
6.5 root 用户和 ren 两个用户启动集群不统一
集群使用普通用户启动,如果需要使用 root 用户启动需要特殊配置不要混用用户启动,所以最好使用脚本统一启动
在 /hadoop/sbin 路径下:将 start-dfs.sh,stop-dfs.sh 两个文件顶部添加以下参数
BASH #!/usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
还有,start-yarn.sh,stop-yarn.sh 顶部也需添加以下:
BASH #!/usr/bin/env bash YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
6.6 配置文件修改不细心
服务启动会出现异常,具体查看服务 log
6.7 不识别主机名称
JAVA java.net.UnknownHostException: hadoop102: hadoop102 at java.net.InetAddress.getLocalHost(InetAddress.java:1475) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Job Submitter.java:146) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
解决办法:(1)在 /etc/hosts 文件中添加 192.168.10.102 hadoop102(2)主机名称不要起 hadoop hadoop000 等特殊名称
6.8 DataNode 和 NameNode 进程同时只能工作一个
6.9 jps 发现进程已经没有,但是重新启动集群,提示进程已经开启
原因是在 Linux 的根目录下 /tmp 目录中存在启动的进程临时文件,将集群相关进程删除掉,再重新启动集群。
6.10 jps 不生效
原因:全局变量 hadoop java 没有生效。解决办法:需要 source /etc/profile 文件。
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~