hive on spark配置
1、安装java、maven、scala、hadoop、mysql、hive
略
2、编译spark
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-2.6,parquet-provided"
3、安装spark
tar -zxvf spark-1.6.0-bin-hadoop2-without-hive.tgz -C /opt/cdh5/
4、配置spark
:spark-env.sh
export JAVA_HOME=/opt/service/jdk1.8.0_151export SCALA_HOME=/opt/service/scala-2.10.5export HADOOP_HOME=/opt/cdh5/hadoop-2.6.0-cdh5.10.0export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HIVE_CONF_DIR=/opt/cdh5/hive-2.1.0/confexport SPARK_WORKER_CORES=4export SPARK_WORKER_INSTANCES=4export SPARK_WORKER_MEMORY=1gexport SPARK_DRIVER_MEMORY=1gexport SPARK_MASTER_IP=chavin.kingexport SPARK_LIBRARY_PATH=/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/libexport SPARK_MASTER_WEBUI_PORT=8080export SPARK_WORKER_WEBUI_PORT=8081export SPARK_WORKER_DIR=/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/workexport SPARK_MASTER_PORT=7077export SPARK_WORKER_PORT=7078export SPARK_LOG_DIR=/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/log
:spark-default.xml
#spark.master yarnspark.master spark://chavin.king:7077spark.home /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hivespark.eventLog.enabled truespark.eventLog.dir hdfs://chavin.king:8020/spark-logspark.serializer org.apache.spark.serializer.KryoSerializerspark.executor.memory 1gspark.driver.memory 1gspark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
:slaves
chavin.king
5、配置yarn
:yarn-site.xml
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
6、配置hive
hive.execution.engine spark
hive.enable.spark.execution.engine true
spark.home /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive spark.master spark://chavin.king:7077 spark.enentLog.enabled true spark.enentLog.dir hdfs://chavin.king:8020/spark-log spark.serializer org.apache.spark.serializer.KryoSerializer spark.executor.memeory 1g spark.driver.memeory 1g spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
7、为hive添加spark jar包:
cp /opt/software/spark-1.6.0/core/target/spark-core_2.10-1.6.0.jar /opt/cdh5/hive-2.1.0/lib/ln -s /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.0.jar /opt/cdh5/hive-2.1.0/lib/
bin/hdfs dfs -put /opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.0.jar hdfs://chavin.king:8020/spark-assembly-1.6.0-hadoop2.6.0.jar
在hive-site.xml中添加:
spark.yarn.jar hdfs://chavin.king:8020/spark-assembly-1.6.0-hadoop2.6.0.jar
8、验证hive on spark是否成功配置
$ bin/hivewhich: no hbase in (/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/bin:/opt/service/maven-3.3.3/bin:/opt/service/scala-2.10.5/bin:/opt/service/jdk1.8.0_151/bin:/opt/service/jdk1.8.0_151/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin)SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/cdh5/hive-2.1.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/cdh5/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/cdh5/hadoop-2.6.0-cdh5.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/opt/cdh5/hive-2.1.0/conf/hive-log4j2.properties Async: truehive (default)> show tables ;OKtab_namet1Time taken: 0.966 seconds, Fetched: 1 row(s)hive (default)> select count(*) from t1;Query ID = hadoop_20171204024017_cda99c42-21eb-480f-9d2a-e0dbb18a9b63Total jobs = 1Launching Job 1 out of 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max=In order to set a constant number of reducers: set mapreduce.job.reduces=Starting Spark Job = e8b4ccc6-2dfa-43b9-99cc-7a066e2c0a0f
Query Hive on Spark job[0] stages:01
Status: Running (Hive on Spark job[0])Job Progress FormatCurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]2017-12-04 02:40:32,861 Stage-0_0: 0/1 Stage-1_0: 0/1 ... ...2017-12-04 02:44:11,388 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1 2017-12-04 02:44:50,826 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished Status: Finished successfully in 268.11 secondsOKc03Time taken: 338.493 seconds, Fetched: 1 row(s)hive (default)> exit;
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
暂时没有评论,来抢沙发吧~