|NO.Z.00021|——————————|BigDataEnd|——|Hadoop&PB级数仓.V05|——|PB数仓.v05|会员活跃度分析|DWD建表&加载数据|

网友投稿 250 2022-11-19

|NO.Z.00021|——————————|BigDataEnd|——|Hadoop&PB级数仓.V05|——|PB数仓.v05|会员活跃度分析|DWD建表&加载数据|

一、DWD层建表和数据加载

### --- DWD层建表和数据加载~~~ 主要任务:ODS(包含json串) => DWD~~~ json数据解析,丢弃无用数据(数据清洗),保留有效信息,并将数据展开,形成每日启动明细表。

2021-09-28 18:19:32.966 [main] INFO com.yanqi.ecommerce.AppStart - {"app_active":{"name":"app_active","json":{"entry":"1","action":"1","error_code":"0"},"time":1596309585861},"attr":{"area":"绍兴","uid":"2F10092A10","app_v":"1.1.16","event_type":"common","device_id":"1FB872-9A10010","os_type":"3.0","channel":"ML","language":"chinese","brand":"Huawei-2"}}2020-07-21

二、DWD层表创建和加载数据

### --- 创建DWD层表:表的格式:parquet、分区表~~~ # 语法use dwd;drop table if exists dwd.dwd_start_log;CREATE TABLE dwd.dwd_start_log(`device_id` string,`area` string,`uid` string,`app_v` string,`event_type` string,`os_type` string,`channel` string,`language` string,`brand` string,`entry` string,`action` string,`error_code` string)PARTITIONED BY (dt string)STORED AS parquet;

hive (default)> use dwd;hive (dwd)> CREATE TABLE dwd.dwd_start_log( > `device_id` string, > `area` string, > `uid` string, > `app_v` string, > `event_type` string, > `os_type` string, > `channel` string, > `language` string, > `brand` string, > `entry` string, > `action` string, > `error_code` string > ) > PARTITIONED BY (dt string) > STORED AS parquet;hive (dwd)> show tables;dwd_start_log

二、加载DWD层数据

### --- 加载DWD层数据[root@hadoop02 ~]# vim /data/yanqidw/script/member_active/dwd_load_start.sh#!/bin/bashsource /etc/profile# 可以输入日期;如果未输入日期取昨天的时间if [ -n "$1" ]then do_date=$1else do_date=`date -d "-1 day" +%F`fi# 定义要执行的SQLsql="with tmp as( select split(str, ' ')[7] line from ods.ods_start_log where dt='$do_date')insert overwrite table dwd.dwd_start_log partition(dt='$do_date')select get_json_object(line, '$.attr.device_id'),get_json_object(line, '$.attr.area'),get_json_object(line, '$.attr.uid'),get_json_object(line, '$.attr.app_v'),get_json_object(line, '$.attr.event_type'),get_json_object(line, '$.attr.os_type'),get_json_object(line, '$.attr.channel'),get_json_object(line, '$.attr.language'),get_json_object(line, '$.attr.brand'),get_json_object(line, '$.app_active.json.entry'),get_json_object(line, '$.app_active.json.action'),get_json_object(line, '$.app_active.json.error_code')from tmp;"hive -e "$sql"

### --- 为脚本授予执行权限[root@hadoop02 ~]# chmod +x /data/yanqidw/script/member_active/dwd_load_start.sh

三、执行该脚本,验证结果

### --- 执行该脚本,验证结果~~~ # 加载数据到ODS分区下~~~ # 确保ods分区下是有数据存在hive (ods)> show partitions ods.ods_start_log;OKpartitiondt=2021-09-28hive (ods)> select * from ods.ods_start_log limit 10;

~~~ # 执行该脚本[root@hadoop02 ~]# /data/yanqidw/script/member_active/dwd_load_start.sh ~~~输出参数Total MapReduce CPU Time Spent: 0 msecOK_c0 _c1 _c2 _c3 _c4 _c5 _c6 _c7 _c8 _c9 _c10 _c11

~~~ # 查看数据是否通过ODS加载到DWD下hive (dwd)> show partitions dwd_start_log;OKpartitiondt=2021-09-28

hive (dwd)> select * from dwd.dwd_start_log limit 5;OKdwd_start_log.device_id dwd_start_log.area dwd_start_log.uid dwd_start_log.app_v dwd_start_log.event_type dwd_start_log.os_type dwd_start_log.channel dwd_start_log.language dwd_start_log.brand dwd_start_log.entry dwd_start_log.action dwd_start_log.error_code dwd_start_log.dt1FB872-9A1001 连云港 2F10092A1 1.1.8 common 0.43 PN chinese iphone-7 1 0 0 2021-09-281FB872-9A1002 金昌 2F10092A2 1.1.5 common 5.8.7 OF chinese xiaomi-0 1 1 0 2021-09-281FB872-9A1003 句容 2F10092A3 1.1.16 common 0.99 YI chinese iphone-3 1 0 0 2021-09-281FB872-9A1004 肇庆 2F10092A4 1.1.6 common 6.6.2 CD chinese xiaomi-6 1 1 0 2021-09-281FB872-9A1005 武汉 2F10092A5 1.1.2 common 6.9 WG chinese xiaomi-1 1 1 0 2021-09-28

### --- 日志文件 =》 Flume =》 HDFS =》 ODS =》 DWD ODS =》 DWD;~~~ json数据的解析;数据清洗~~~ 下一步任务:DWD(会员的每日启动信息明细) => DWS(如何建表,如何加载数据)~~~ 活跃会员 ===> 新增会员 ===> 会员留存

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:java实现IP地址转换
下一篇:工业相机编程模型和流程及SDK接口使用
相关文章

 发表评论

暂时没有评论,来抢沙发吧~