【Hive】Hive 查询

网友投稿 281 2022-08-25

【Hive】Hive 查询

文章目录

​​一、环境准备​​​​二、Hive 查询​​

​​1、普通查询​​​​2、别名查询​​​​3、限定查询​​​​4、多表联合查询​​​​5、多表插入​​​​6、多目录输出文件​​

环境准备Hadoop 完全分布式(一主两从即可)MySQL环境、Hive环境

一、环境准备

将 ​​buyer_log​​ 、​​buyer_favorite​​ 导入到 ​​/data/hive-data​​ 下:

创建卖家行为日志表,名为 ​​buyer_log​​,包含​​ID(id)​​、​​用户ID(buyer_id)​​、​​时间(dt)​​、​​地点(ip)​​、​​操作类型(opt_type)​​ 5 个字段,字符类型为 ​​string​​,按照 ​​“\t”​​ 分割符:

hive> create table buyer_log > (id string,buyer_id string,dt string,ip string,opt_type string) > row format delimited fields terminated by '\t' > stored as textfile;OKTime taken: 1.752

创建买家收藏表,名为 ​​buyer_favorite​​,包含 ​​用户ID(buyer_id)​​、​​商品ID(goods_id)​​、​​时间(dt)​​ 3 个字段,字符类型为 ​​string​​,按照 ​​“\t”​​ 分割符:

hive> create table buyer_favorite > (buyer_id string,goods_id string,dt string) > row format delimited fields terminated by '\t' > stored as textfile;OKTime taken: 0.141

将本地的 /data/hive-data 下的上述两个文件中的数据导入到刚刚创建的两张表中:

hive> load data local inpath '/../home/data/hive-data/buyer_log' into table buyer_log;Loading data to table db.buyer_logOKTime taken: 3.36 secondshive> load data local inpath '/../home/data/hive-data/buyer_favorite' into table buyer_favorite;Loading data to table db.buyer_favoriteOKTime taken: 0.413

​​返回顶部​​

二、Hive 查询

1、普通查询

查询 ​​buyer_log​​ 表中的全部字段,数据量大的时候,应当避免查询全部的数据。这里我们使用 ​​limit​​ 关键字进行限制查询前​​10​​条数据:

hive> select * from buyer_log limit 10;OK461 10181 2010-03-26 19:45:07 123.127.164.252 1462 10262 2010-03-26 19:55:10 123.127.164.252 1463 20001 2010-03-29 14:28:02 221.208.129.117 2464 20001 2010-03-29 14:28:02 221.208.129.117 1465 20002 2010-03-30 10:56:35 222.44.94.235 2466 20002 2010-03-30 10:56:35 222.44.94.235 1481 10181 2010-03-31 16:48:43 123.127.164.252 1482 10181 2010-04-01 17:35:05 123.127.164.252 1483 10181 2010-04-02 10:34:20 123.127.164.252 1484 20001 2010-04-04 16:38:22 221.208.129.38 1Time taken: 1.467 seconds, Fetched: 10 row(s)

​​返回顶部​​

2、别名查询

查询表 ​​buyer_log​​ 中的 ​​id​​ 字段 和 ​​ip​​ 字段,当多表连接字段较多时,常常使用别名:

hive> select b.id,b.ip from buyer_log b limit 10;OK461 123.127.164.252462 123.127.164.252463 221.208.129.117464 221.208.129.117465 222.44.94.235466 222.44.94.235481 123.127.164.252482 123.127.164.252483 123.127.164.252484 221.208.129.38Time taken: 0.108 seconds, Fetched: 10 row(s)

​​返回顶部​​

3、限定查询

查询表 ​​buyer_log​​ 中的 ​​opt_type=1​​ 的用户 ​​ID(buyer_id)​​:

hive> select buyer_id from buyer_log where opt_type=1 limit 10;OK10181102622000120002101811018110181200011018120021Time taken: 0.361 seconds, Fetched: 10 row(s)

​​返回顶部​​

4、多表联合查询

两表或多表进行查询的时候,如通过 ​​用户ID(buyer_id)​​连接表 ​​buyer_log​​ 、​​buyer_favorite​​,查询表 ​​buyer_log​​ 的 ​​dt​​ 字段和表 ​​buyer_favorite​​ 的 ​​goods_id​​ 字段,多表联合查询可以按需求查询多个表中不同字段:

hive> select l.dt,f.goods_id from buyer_log l,buyer_favorite f > where l.buyer_id = f.buyer_id > limit 10;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20220312204110_aa886926-12e1-4fc7-a0b7-2d21e4323941Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2022-03-12 20:41:29 Starting to launch local task to process map join; maximum memory = 4776263682022-03-12 20:41:31 Dump the side-table for tag: 1 with group count: 682 into file: file:/usr/local/src/hive/tmp/ade490ef-9595-4235-9a9c-f58620ae753f/hive_2022-03-12_20-41-10_247_5739433956695466542-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable2022-03-12 20:41:31 Uploaded 1 File to: file:/usr/local/src/hive/tmp/ade490ef-9595-4235-9a9c-f58620ae753f/hive_2022-03-12_20-41-10_247_5739433956695466542-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (51658 bytes)2022-03-12 20:41:31 End of local task; Time Taken: 1.91 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1647086333827_0001, Tracking URL = Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1647086333827_0001Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02022-03-12 20:42:57,007 Stage-3 map = 0%, reduce = 0%2022-03-12 20:43:26,906 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 3.96 secMapReduce Total cumulative CPU time: 3 seconds 960 msecEnded Job = job_1647086333827_0001MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 3.96 sec HDFS Read: 137752 HDFS Write: 487 SUCCESSTotal MapReduce CPU Time Spent: 3 seconds 960 msecOK2010-03-26 19:45:07 10004812010-03-26 19:45:07 10031852010-03-26 19:45:07 10026432010-03-26 19:45:07 10029942010-03-26 19:55:10 10033262010-03-29 14:28:02 10015972010-03-29 14:28:02 10015602010-03-29 14:28:02 10016502010-03-29 14:28:02 10024102010-03-29 14:28:02 1002989Time taken: 138.793 seconds, Fetched: 10 row(s)

​​返回顶部​​

5、多表插入

多表插入指的是在同一条语句中,把读取的同一份数据插入到不同的表中,只需要扫描一遍数据即可完成所有表的插入操作,效率很高。我们使用买家行为日志 buyer_log 表作为插入表,创建 buyer_log1 和 buyer_log2 两表作为被插入表:

hive> create table buyer_log1 like buyer_log;OKTime taken: 1.199 secondshive> create table buyer_log2 like buyer_log;OKTime taken: 0.095

将 ​​buyer_log​​ 中的数据插入到 ​​buyer_log1​​ 、​​buyer_log2​​ 中:

hive> from buyer_log > insert overwrite table buyer_log1 select * > insert overwrite table buyer_log2 select *;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20220312205124_ae99b1d8-9ada-4358-9b64-9c3d61c6de76Total jobs = 5Launching Job 1 out of 5Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1647086333827_0002, Tracking URL = Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1647086333827_0002Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 02022-03-12 20:51:55,535 Stage-2 map = 0%, reduce = 0%2022-03-12 20:52:14,808 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.05 secMapReduce Total cumulative CPU time: 2 seconds 50 msecEnded Job = job_1647086333827_0002Stage-5 is selected by condition resolver.Stage-4 is filtered out by condition resolver.Stage-6 is filtered out by condition resolver.Stage-11 is selected by condition resolver.Stage-10 is filtered out by condition resolver.Stage-12 is filtered out by condition resolver.Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/buyer_log1/.hive-staging_hive_2022-03-12_20-51-24_797_2412140504440982474-1/-ext-10000Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/buyer_log2/.hive-staging_hive_2022-03-12_20-51-24_797_2412140504440982474-1/-ext-10002Loading data to table db.buyer_log1Loading data to table db.buyer_log2MapReduce Jobs Launched: Stage-Stage-2: Map: 1 Cumulative CPU: 2.05 sec HDFS Read: 14432909 HDFS Write: 28293834 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 50 msecOKTime taken: 51.7

​​返回顶部​​

6、多目录输出文件

将统一文件输出到本地不同文件中,提高效率,可以避免重复操作 ​​from​​,将买家行为日志 ​​buyer_log​​ 表导入到本地 ​​/data/hive-data/out1​​ 、​​/data/hive-data/out2​​ 中:

[root@server hive-data]# mkdir ./out1 //首先创建两个文件夹[root@server hive-data]# mkdir ./out2[root@server hive-data]# ll总用量 23084-rw-r--r--. 1 root root 102889 3月 6 10:52 buyer_favorite-rw-r--r--. 1 root root 14427403 3月 6 10:52 buyer_log-rw-r--r--. 1 root root 2164 3月 6 10:52 cat_group-rw-r--r--. 1 root root 208799 3月 6 10:52 goods-rw-r--r--. 1 root root 82421 3月 6 10:52 goods_visit-rw-r--r--. 1 root root 8796085 3月 6 10:52 order_itemsdrwxr-xr-x. 2 root root 43 3月 6 11:50 outdrwxr-xr-x. 2 root root 6 3月 12 20:57 out1drwxr-xr-x. 2 root root 6 3月 12 20:57 out2-rw-r--r--. 1 root root 287 3月 6 10:52 sydata.txt

hive> from buyer_log > insert overwrite local directory '/home/data/hive-data/out1' select * > insert overwrite local directory '/home/data/hive-data/out2' select *;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20220312210028_a1b22b5b-255a-44b0-9b87-8c43ed291451Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1647086333827_0003, Tracking URL = Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1647086333827_0003Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 02022-03-12 21:00:48,788 Stage-2 map = 0%, reduce = 0%2022-03-12 21:00:55,289 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.25 secMapReduce Total cumulative CPU time: 2 seconds 250 msecEnded Job = job_1647086333827_0003Moving data to local directory /home/data/hive-data/out1Moving data to local directory /home/data/hive-data/out2MapReduce Jobs Launched: Stage-Stage-2: Map: 1 Cumulative CPU: 2.25 sec HDFS Read: 14432070 HDFS Write: 28293676 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 250 msecOKTime taken: 29.427

查看本地存储的文件:

[root@server hive-data]# ll ./out1总用量 13816-rw-r--r--. 1 root root 14146838 3月 12 21:00 000000_0[root@server hive-data]# ll ./out2总用量 13816-rw-r--r--. 1 root root 14146838 3月 12 21:00 000000_0

​​返回顶部​​

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:【Hive】Hive 分区表、桶的基本操作
下一篇:【Vue】Vue 项目前、后端整合(图表三:产品库存统计饼图)
相关文章

 发表评论

暂时没有评论,来抢沙发吧~