hive on spark配置

网友投稿 290 2022-11-23

hive on spark配置

1、安装java、maven、scala、hadoop、mysql、hive 略 2、编译spark ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-2.6,parquet-provided" 3、安装spark tar -zxvf spark-1.6.0-bin-hadoop2-without-hive.tgz -C /opt/cdh5/ 4、配置spark :spark-env.sh export JAVA_HOME=/opt/service/jdk1.8.0_151export SCALA_HOME=/opt/service/scala-2.10.5export HADOOP_HOME=/opt/cdh5/hadoop-2.6.0-cdh5.10.0export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HIVE_CONF_DIR=/opt/cdh5/hive-2.1.0/confexport SPARK_WORKER_CORES=4export SPARK_WORKER_INSTANCES=4export SPARK_WORKER_MEMORY=1gexport SPARK_DRIVER_MEMORY=1gexport SPARK_MASTER_IP=chavin.kingexport SPARK_LIBRARY_PATH=/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/libexport SPARK_MASTER_WEBUI_PORT=8080export SPARK_WORKER_WEBUI_PORT=8081export SPARK_WORKER_DIR=/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/workexport SPARK_MASTER_PORT=7077export SPARK_WORKER_PORT=7078export SPARK_LOG_DIR=/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/log :spark-default.xml #spark.master                     yarnspark.master                     spark://chavin.king:7077spark.home                       /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hivespark.eventLog.enabled           truespark.eventLog.dir               hdfs://chavin.king:8020/spark-logspark.serializer                 org.apache.spark.serializer.KryoSerializerspark.executor.memory            1gspark.driver.memory              1gspark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" :slaves chavin.king 5、配置yarn :yarn-site.xml    yarn.resourcemanager.scheduler.class   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 6、配置hive    hive.execution.engine   spark    hive.enable.spark.execution.engine   true    spark.home   /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive   spark.master   spark://chavin.king:7077   spark.enentLog.enabled   true   spark.enentLog.dir   hdfs://chavin.king:8020/spark-log   spark.serializer   org.apache.spark.serializer.KryoSerializer   spark.executor.memeory   1g   spark.driver.memeory   1g   spark.executor.extraJavaOptions   -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" 7、为hive添加spark jar包: cp /opt/software/spark-1.6.0/core/target/spark-core_2.10-1.6.0.jar /opt/cdh5/hive-2.1.0/lib/ln -s /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.0.jar /opt/cdh5/hive-2.1.0/lib/ bin/hdfs dfs -put /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.0.jar hdfs://chavin.king:8020/spark-assembly-1.6.0-hadoop2.6.0.jar 在hive-site.xml中添加:    spark.yarn.jar   hdfs://chavin.king:8020/spark-assembly-1.6.0-hadoop2.6.0.jar 8、验证hive on spark是否成功配置 $ bin/hivewhich: no hbase in (/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/bin:/opt/service/maven-3.3.3/bin:/opt/service/scala-2.10.5/bin:/opt/service/jdk1.8.0_151/bin:/opt/service/jdk1.8.0_151/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin)SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/cdh5/hive-2.1.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/cdh5/hadoop-2.6.0-cdh5.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/opt/cdh5/hive-2.1.0/conf/hive-log4j2.properties Async: truehive (default)> show tables ;OKtab_namet1Time taken: 0.966 seconds, Fetched: 1 row(s)hive (default)> select count(*) from t1;Query ID = hadoop_20171204024017_cda99c42-21eb-480f-9d2a-e0dbb18a9b63Total jobs = 1Launching Job 1 out of 1In order to change the average load for a reducer (in bytes):   set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers:   set hive.exec.reducers.max=In order to set a constant number of reducers:   set mapreduce.job.reduces=Starting Spark Job = e8b4ccc6-2dfa-43b9-99cc-7a066e2c0a0f Query Hive on Spark job[0] stages:01 Status: Running (Hive on Spark job[0])Job Progress FormatCurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]2017-12-04 02:40:32,861    Stage-0_0: 0/1    Stage-1_0: 0/1    ... ...2017-12-04 02:44:11,388    Stage-0_0: 1/1 Finished    Stage-1_0: 0(+1)/1    2017-12-04 02:44:50,826    Stage-0_0: 1/1 Finished    Stage-1_0: 1/1 Finished    Status: Finished successfully in 268.11 secondsOKc03Time taken: 338.493 seconds, Fetched: 1 row(s)hive (default)> exit;

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Java中list集合为空或为null的区别说明
下一篇:工业互联网怎样赋能企业
相关文章

 发表评论

暂时没有评论,来抢沙发吧~