c语言sscanf函数的用法是什么
290
2022-11-19
|NO.Z.00024|——————————|BigDataEnd|——|Hadoop&PB级数仓.V08|——|PB数仓.v08|会员活跃度分析|新增会员&新增会员计算思路|
一、新增会员
### --- 新增会员~~~ 留存会员:某段时间的新增会员,经过一段时间后,仍继续使用应用认为是留存会员;~~~ 新增会员:第一次使用应用的用户,定义为新增会员;卸载再次安装的设备,不会被算作一次新增。~~~ 新增会员先计算 => 计算会员留存
### --- 案例分析:~~~ # 需求:每日新增会员数~~~ # 08-02:~~~ DWD:会员每日启动明细(95-110);所有会员的信息(1-100)???~~~ 新增会员:101-110~~~ 新增会员数据 + 旧的所有会员的信息 = 新的所有会员的信息(1-110)~~~ # 08-03:~~~ DWD:会员每日启动明细(100-120);所有会员的信息(1-110)~~~ 新增会员:111-120~~~ 新增会员数据 + 旧的所有会员的信息 = 新的所有会员的信息(1-120)
### --- 计算步骤:~~~ 计算新增会员~~~ 更新所有会员信息
### --- 改进后方法:~~~ 在所有会员信息中增加时间列,表示这个会员是哪一天成为新增会员~~~ 只需要一张表:所有会员的信息(id,dt)~~~ 将新增会员 插入 所有会员表中
二、计算新增会员实例操作
### --- 步骤一:日志动表t1### --- 日志动表t1:如何计算新增会员~~~ # 语法-- 日启动表 => DWSuse test;drop table t1;create table t1(id int, dt string)row format delimited fields terminated by ',';load data local inpath '/data/yanqidw/logs/data/t10.dat' into table t1;select * from t1;
~~~ # 准备数据集[root@hadoop02 ~]# vim /data/yanqidw/logs/data/t10.dat4,2020-08-025,2020-08-026,2020-08-027,2020-08-028,2020-08-029,2020-08-02
~~~ # 建表并查看数据~~~ # 建表hive (default)> use test;hive (test)> drop table t1;hive (test)> create table t1(id int, dt string) > row format delimited fields terminated by ',';hive (test)> load data local inpath '/data/yanqidw/logs/data/t10.dat' into table t1;
~~~ # 查看加载的数据hive (test)> select * from t1;t1.id t1.dt4 2020-08-025 2020-08-026 2020-08-027 2020-08-028 2020-08-029 2020-08-02
### --- 步骤二:全量数据t2~~~ # 全量数据t2:如何计算新增会员~~~ # 语法-- 全量数据 => DWSdrop table t2;create table t2(id int, dt string)row format delimited fields terminated by ',';load data local inpath '/data/yanqidw/logs/data/t2.dat' into table t2;
~~~ # 准备数据集[root@hadoop02 ~]# vim /data/yanqidw/logs/data/t2.dat1,2020-08-012,2020-08-013,2020-08-014,2020-08-015,2020-08-016,2020-08-01
~~~ # 建表并查看数据~~~ # 建表hive (test)> drop table t2;hive (test)> create table t2(id int, dt string) > row format delimited fields terminated by ',';hive (test)> load data local inpath '/data/yanqidw/logs/data/t2.dat' into table t2;
~~~ # 查看加载的数据hive (test)> select * from t2;t2.id t2.dt1 2020-08-012 2020-08-013 2020-08-014 2020-08-015 2020-08-016 2020-08-01
### --- 步骤三:将找到 2020-08-02 新用户数据插入t2表中~~~ # 找出 2020-08-02 的新用户~~~ # 语法~~~ 找出 2020-08-02 的新用户select t1.id, t1.dt, t2.id, t2.dtfrom t1 left join t2 on t1.id=t2.idwhere t1.dt="2020-08-02";select t1.id, t1.dtfrom t1 left join t2 on t1.id=t2.idwhere t1.dt="2020-08-02"and t2.id is null;
~~~ # 操作实例hive (test)> select t1.id, t1.dt, t2.id, t2.dt > from t1 left join t2 on t1.id=t2.id > where t1.dt="2020-08-02";~~~输出参数t1.id t1.dt t2.id t2.dt4 2020-08-02 4 2020-08-015 2020-08-02 5 2020-08-016 2020-08-02 6 2020-08-017 2020-08-02 NULL NULL8 2020-08-02 NULL NULL9 2020-08-02 NULL NULL
hive (test)> select t1.id, t1.dt > from t1 left join t2 on t1.id=t2.id > where t1.dt="2020-08-02" > and t2.id is null;~~~输出参数t1.id t1.dt7 2020-08-028 2020-08-029 2020-08-02
~~~ # 将找到 2020-08-02 新用户数据插入t2表中~~~ # 语法~~~ 将找到 2020-08-02 新用户数据插入t2表中insert into table t2select t1.id, t1.dtfrom t1 left join t2 on t1.id=t2.idwhere t1.dt="2020-08-02"and t2.id is null;~~~ 检查结果select * from t2;
~~~ # 操作实例hive (test)> insert into table t2 > select t1.id, t1.dt > from t1 left join t2 on t1.id=t2.id > where t1.dt="2020-08-02" > and t2.id is null;~~~输出参数hive (test)> select * from t2;t2.id t2.dt7 2020-08-028 2020-08-029 2020-08-021 2020-08-012 2020-08-013 2020-08-014 2020-08-015 2020-08-016 2020-08-01
### --- 步骤四:将找到 2020-08-03 新用户数据插入t2表中~~~ # t1加载2020--8-03的数据到t1表中~~~ # 语法~~~ t1 加载 2020-08-03 的数据load data local inpath '/data/yanqidw/logs/data/t3.dat' into table t1;
~~~ # 准备数据[root@hadoop02 ~]# vim /data/yanqidw/logs/data/t3.dat14,2020-08-0315,2020-08-0316,2020-08-0317,2020-08-0318,2020-08-0319,2020-08-03
~~~ # 操作实例~~~ t1 加载 2020-08-03 的数据hive (test)> load data local inpath '/data/yanqidw/logs/data/t3.dat' into table t1;
~~~ # 查看t1表中现在的数据hive (test)> select * from t1;t1.id t1.dt4 2020-08-025 2020-08-026 2020-08-027 2020-08-028 2020-08-029 2020-08-0214 2020-08-0315 2020-08-0316 2020-08-0317 2020-08-0318 2020-08-0319 2020-08-03
### --- 步骤五:将找到 2020-08-03 新用户数据插入t2表中~~~ # 将找到 2020-08-03 新用户数据插入t2表中~~~ # 语法~~~ 将找到 2020-08-03 新用户数据插入t2表中insert into table t2select t1.id, t1.dtfrom t1 left join t2 on t1.id=t2.idwhere t1.dt="2020-08-03"and t2.id is null;
~~~ 检查结果select * from t2;
~~~ # 操作实例hive (test)> insert into table t2 > select t1.id, t1.dt > from t1 left join t2 on t1.id=t2.id > where t1.dt="2020-08-03" > and t2.id is null;
hive (test)> select * from t2;t2.id t2.dt7 2020-08-028 2020-08-029 2020-08-0214 2020-08-0315 2020-08-0316 2020-08-0317 2020-08-0318 2020-08-0319 2020-08-031 2020-08-012 2020-08-013 2020-08-014 2020-08-015 2020-08-016 2020-08-01
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~