c语言sscanf函数的用法是什么
289
2022-11-19
|NO.Z.00022|——————————|BigDataEnd|——|Hadoop&PB级数仓.V06|——|PB数仓.v06|会员活跃度分析|活跃会员&DWS建表&DWS加载数据|
一、活跃会员
### --- 活跃会员~~~ 活跃会员:打开应用的会员即为活跃会员;~~~ 新增会员:第一次使用应用的用户,定义为新增会员;~~~ 留存会员:某段时间的新增会员,经过一段时间后,仍继续使用应用认为是留存会员;~~~ 活跃会员指标需求:每日、每周、每月的活跃会员数~~~ DWD:会员的每日启动信息明细(会员都是活跃会员;某个会员可能会出现多次)~~~ DWS:每日活跃会员信息(关键)、每周活跃会员信息、每月活跃会员信息~~~ 每日活跃会员信息 ===> 每周活跃会员信息~~~ 每日活跃会员信息 ===> 每月活跃会员信息~~~ ADS:每日、每周、每月活跃会员数(输出)
~~~ # ADS表结构:~~~ 备注:周、月为自然周、自然月daycnt weekcnt monthcnt dt
### --- 处理过程:~~~ 建表(每日、每周、每月活跃会员信息)~~~ 每日启动明细 ===> 每日活跃会员~~~ 每日活跃会员 => 每周活跃会员;每日活跃会员 => 每月活跃会员~~~ 汇总生成ADS层的数据
二、活跃会员计算:创建DWS层表
### --- 创建DWS层表~~~ # 语法use dws;drop table if exists dws.dws_member_start_day;create table dws.dws_member_start_day(`device_id` string,`uid` string,`app_v` string,`os_type` string,`language` string,`channel` string,`area` string,`brand` string) COMMENT '会员日启动汇总'partitioned by(dt string)stored as parquet;
~~~ # 语法drop table if exists dws.dws_member_start_week;create table dws.dws_member_start_week(`device_id` string,`uid` string,`app_v` string,`os_type` string,`language` string, `channel` string,`area` string,`brand` string,`week` string) COMMENT '会员周启动汇总'PARTITIONED BY (`dt` string)stored as parquet;
~~~ # 语法drop table if exists dws.dws_member_start_month;create table dws.dws_member_start_month(`device_id` string,`uid` string,`app_v` string, `os_type` string,`language` string,`channel` string,`area` string,`brand` string,`month` string) COMMENT '会员月启动汇总'PARTITIONED BY (`dt` string)stored as parquet;
~~~ # 操作示例hive (dwd)> use dws;OKhive (dws)> drop table if exists dws.dws_member_start_day;OKhive (dws)> create table dws.dws_member_start_day > ( > `device_id` string, > `uid` string, > `app_v` string, > `os_type` string, > `language` string, > `channel` string, > `area` string, > `brand` string > ) COMMENT '会员日启动汇总' > partitioned by(dt string) > stored as parquet;OK
~~~ # 操作实例hive (dws)> drop table if exists dws.dws_member_start_week;OKhive (dws)> create table dws.dws_member_start_week > ( > `device_id` string, > `uid` string, > `app_v` string, > `os_type` string, > `language` string, > `channel` string, > `area` string, > `brand` string, > `week` string > ) COMMENT '会员周启动汇总' > PARTITIONED BY (`dt` string) > stored as parquet;OK
~~~ # 操作实例hive (dws)> drop table if exists dws.dws_member_start_month;OKhive (dws)> create table dws.dws_member_start_month( > `device_id` string, > `uid` string, > `app_v` string, > `os_type` string, > `language` string, > `channel` string, > `area` string, > `brand` string, > `month` string > ) COMMENT '会员月启动汇总' > PARTITIONED BY (`dt` string) > stored as parquet;OK
三、活跃会员计算:加载数据
### --- 创建DWS层级加载数据脚本~~~ 注意shell的引号:# ODS => DWD => DWS(每日、每周、每月活跃会员的汇总表)[root@hadoop02 ~]# vim /data/yanqidw/script/member_active/dws_load_member_start.sh#!/bin/bashsource /etc/profile# 可以输入日期;如果未输入日期取昨天的时间if [ -n "$1" ]then do_date=$1else do_date=`date -d "-1 day" +%F`fi# 定义要执行的SQL# 汇总得到每日活跃会员信息;每日数据汇总得到每周、每月数据sql="-- 汇总得到每日活跃会员insert overwrite table dws.dws_member_start_daypartition(dt='$do_date')select device_id,concat_ws('|', collect_set(uid)),concat_ws('|', collect_set(app_v)),concat_ws('|', collect_set(os_type)),concat_ws('|', collect_set(language)),concat_ws('|', collect_set(channel)),concat_ws('|', collect_set(area)),concat_ws('|', collect_set(brand))from dwd.dwd_start_logwhere dt='$do_date'group by device_id;-- 汇总得到每周活跃会员insert overwrite table dws.dws_member_start_weekpartition(dt='$do_date')select device_id,concat_ws('|', collect_set(uid)),concat_ws('|', collect_set(app_v)),concat_ws('|', collect_set(os_type)),concat_ws('|', collect_set(language)),concat_ws('|', collect_set(channel)),concat_ws('|', collect_set(area)),concat_ws('|', collect_set(brand)),date_add(next_day('$do_date', 'mo'), -7)from dws.dws_member_start_daywhere dt >= date_add(next_day('$do_date', 'mo'), -7)and dt <= '$do_date'group by device_id;-- 汇总得到每月活跃会员insert overwrite table dws.dws_member_start_monthpartition(dt='$do_date')select device_id,concat_ws('|', collect_set(uid)),concat_ws('|', collect_set(app_v)),concat_ws('|', collect_set(os_type)),concat_ws('|', collect_set(language)),concat_ws('|', collect_set(channel)),concat_ws('|', collect_set(area)),concat_ws('|', collect_set(brand)),date_format('$do_date', 'yyyy-MM')from dws.dws_member_start_daywhere dt >= date_format('$do_date', 'yyyy-MM-01')and dt <= '$do_date'group by device_id;"hive -e "$sql"
### --- 为脚本授予可执行权限并验证是否加载数据~~~ # 授予可执行权限[root@hadoop02 ~]# chmod +x /data/yanqidw/script/member_active/dws_load_member_start.sh
~~~ # 执行该脚本,加载数据[root@hadoop02 ~]# /data/yanqidw/script/member_active/dws_load_member_start.sh
~~~ # 查看每日的数据hive (dws)> show partitions dws_member_start_day;partitiondt=2021-09-28hive (dws)> select count(*) from dws.dws_member_start_day;9999 ~~~ # 查看每周的数据hive (dws)> show partitions dws_member_start_week;partitiondt=2021-09-28hive (dws)> select count(*) from dws.dws_member_start_week;9999 ~~~ # 查看每月的数据 hive (dws)> show partitions dws_member_start_month;partitiondt=2021-09-28hive (dws)> select count(*) from dws.dws_member_start_month;9999
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~