|NO.Z.00019|——————————|BigDataEnd|——|Hadoop&PB级数仓.V03|——|PB数仓.v03|会员活跃度分析|json数据处理&使用UDF处理json串|

网友投稿 256 2022-11-19

|NO.Z.00019|——————————|BigDataEnd|——|Hadoop&PB级数仓.V03|——|PB数仓.v03|会员活跃度分析|json数据处理&使用UDF处理json串|

一、使用UDF处理

### --- 自定义UDF处理json串中的数组。自定义UDF函数:~~~ 输入:json串、数组的key~~~ 输出:字符串数组

二、创建一个maven项目

### --- 创建一个maven项目:cn.yanqi.dw### --- 添加pom.xml依赖 org.apache.hive hive-exec 2.3.7 provided

三、编程代码实现

### --- 编程代码实现package cn.yanqi.dw.hive.udf;import com.alibaba.fastjson.JSON;import com.alibaba.fastjson.JSONArray;import com.alibaba.fastjson.JSONException;import com.alibaba.fastjson.JSONObject;import com.google.common.base.Strings;import org.apache.hadoop.hive.ql.exec.UDF;import org.junit.Test;import java.util.ArrayList;public class ParseJsonArray extends UDF { public ArrayList evaluate(String jsonStr, String arrKey){ // 传入空字符串,返回null if (Strings.isNullOrEmpty(jsonStr)) { return null; } try{ // 获取jsonArray JSONObject object = JSON.parseObject(jsonStr); JSONArray jsonArray = object.getJSONArray(arrKey); ArrayList result = new ArrayList<>(); for (Object o: jsonArray){ result.add(o.toString()); } return result; } catch (JSONException e){ return null; } } @Test public void JunitParseJsonArray(){ String str = "{\"id\": 1,\"ids\": [101,102,103],\"total_number\": 3}"; String key = "ids"; ArrayList evaluate = evaluate(str, key); System.out.println(JSON.toJSONString(evaluate)); }}

四、编译打印

### --- 编译打印D:\JAVA\jdk1.8.0_231\bin\java.exe -ea -Didea.test.cyclic.buffer.size=1048576 "-javaagent:D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\lib\idea_rt.jar=64989:D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\bin" -Dfile.encoding=UTF-8 -classpath "D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\lib\idea_rt.jar;D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\plugins\junit\lib\junit5-rt.jar;D:\IntelliJIDEA\IntelliJ IDEA 2019.3.3\plugins\junit\lib\junit-rt.jar;D:\JAVA\jdk1.8.0_231\jre\lib\charsets.jar;D:\JAVA\jdk1.8.0_231\jre\lib\deploy.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\access-bridge-64.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\cldrdata.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\dnsns.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\jaccess.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\jfxrt.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\localedata.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\nashorn.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunec.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunjce_provider.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunmscapi.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\sunpkcs11.jar;D:\JAVA\jdk1.8.0_231\jre\lib\ext\zipfs.jar;D:\JAVA\jdk1.8.0_231\jre\lib\javaws.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jce.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jfr.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jfxswt.jar;D:\JAVA\jdk1.8.0_231\jre\lib\jsse.jar;D:\JAVA\jdk1.8.0_231\jre\lib\management-agent.jar;D:\JAVA\jdk1.8.0_231\jre\lib\plugin.jar;D:\JAVA\jdk1.8.0_231\jre\lib\resources.jar;D:\JAVA\jdk1.8.0_231\jre\lib\rt.jar;E:\NO.Z.10000——javaproject\NO.Z.00002.Hadoop\cn.yanqi.dw\target\classes;C:\Users\Administrator\.m2\repository\org\apache\flume\flume-ng-core\1.9.0\flume-ng-core-1.9.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flume\flume-ng-sdk\1.9.0\flume-ng-sdk-1.9.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flume\flume-ng-configuration\1.9.0\flume-ng-configuration-1.9.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flume\flume-ng-configfilters\flume-ng-config-filter-api\1.9.0\flume-ng-config-filter-api-1.9.0.jar;C:\Users\Administrator\.m2\repository\org\apache\flume\flume-ng-auth\1.9.0\flume-ng-auth-1.9.0.jar;C:\Users\Administrator\.m2\repository\org\slf4j\slf4j-api\1.7.25\slf4j-api-1.7.25.jar;C:\Users\Administrator\.m2\repository\com\google\guava\guava\11.0.2\guava-11.0.2.jar;C:\Users\Administrator\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\Administrator\.m2\repository\commons-io\commons-io\2.1\commons-io-2.1.jar;C:\Users\Administrator\.m2\repository\commons-codec\commons-codec\1.8\commons-codec-1.8.jar;C:\Users\Administrator\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\Administrator\.m2\repository\commons-lang\commons-lang\2.5\commons-lang-2.5.jar;C:\Users\Administrator\.m2\repository\org\apache\avro\avro\1.7.4\avro-1.7.4.jar;C:\Users\Administrator\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.8.8\jackson-core-asl-1.8.8.jar;C:\Users\Administrator\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.8.8\jackson-mapper-asl-1.8.8.jar;C:\Users\Administrator\.m2\repository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\Users\Administrator\.m2\repository\org\xerial\snappy\snappy-java\1.0.4.1\snappy-java-1.0.4.1.jar;C:\Users\Administrator\.m2\repository\org\apache\avro\avro-ipc\1.7.4\avro-ipc-1.7.4.jar;C:\Users\Administrator\.m2\repository\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;C:\Users\Administrator\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\Administrator\.m2\repository\org\apache\velocity\velocity\1.7\velocity-1.7.jar;C:\Users\Administrator\.m2\repository\commons-collections\commons-collections\3.2.1\commons-collections-3.2.1.jar;C:\Users\Administrator\.m2\repository\io\netty\netty\3.10.6.Final\netty-3.10.6.Final.jar;C:\Users\Administrator\.m2\repository\joda-time\joda-time\2.9.9\joda-time-2.9.9.jar;C:\Users\Administrator\.m2\repository\org\eclipse\jetty\jetty-servlet\9.4.6.v20170531\jetty-servlet-9.4.6.v20170531.jar;C:\Users\Administrator\.m2\repository\org\eclipse\jetty\jetty-security\9.4.6.v20170531\jetty-security-9.4.6.v20170531.jar;C:\Users\Administrator\.m2\repository\org\eclipse\jetty\jetty-util\9.4.6.v20170531\jetty-util-9.4.6.v20170531.jar;C:\Users\Administrator\.m2\repository\org\eclipse\jetty\jetty-server\9.4.6.v20170531\jetty-server-9.4.6.v20170531.jar;C:\Users\Administrator\.m2\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\Users\Administrator\.m2\repository\org\eclipse\jetty\jetty-com.intellij.rt.junit.JUnitStarter -ideVersion5 -junit4 cn.yanqi.dw.hive.udf.ParseJsonArray["101","102","103"]

五、打成jar包并上传到服务器中

### --- 编译打印并上传到服务器中[root@hadoop02 ~]# ll /data/yanqidw/jars/cn.yanqi.dw-1.0-SNAPSHOT-jar-with-dependencies.jar

六、使用自定义UDF函数

### --- 添加UDF自定义函数包~~~ # 语法~~~ 添加开发的jar包(在Hive命令行中)add jar /data/yanqidw/jars/cn.yanqi.dw-1.0-SNAPSHOT-jar-with-dependencies.jar;~~~ # 操作实例[root@hadoop02 ~]# hivehive (default)> add jar /data/yanqidw/jars/cn.yanqi.dw-1.0-SNAPSHOT-jar-with-dependencies.jar;

### --- 创建临时函数~~~ # 语法~~~ 创建临时函数。指定类名一定要完整的路径,即包名加类名create temporary function yanqi_json_array as "cn.yanqi.dw.hive.udf.ParseJsonArray";~~~ # 操作实例hive (default)> create temporary function yanqi_json_array as "cn.yanqi.dw.hive.udf.ParseJsonArray";

### --- 执行查询~~~ # 切换到use test分区下hive (default)> use test;hive (test)> show tables;jsont1

### --- 解析json串中的数组~~~ # 语法~~~ 执行查询~~~ 解析json串中的数组select username, age, sex, yanqi_json_array(json, "ids") ids from jsont1;

~~~ # 操作实例hive (test)> select username, age, sex, yanqi_json_array(json, "ids") > ids from jsont1;OKusername age sex idsuser1 18 male ["101","102","103"]user2 20 female ["201","202","203","204"]user3 23 male ["301","302","303","304","305"]user4 17 male ["401","402","403","304"]user5 35 female ["501","502","503"]

### --- 解析json串中的数组并展开~~~ # 语法~~~ 解析json串中的数组,并展开select username, age, sex, ids1 from jsont1lateral view explode(yanqi_json_array(json, "ids")) t1 as ids1;

~~~ # 操作实例hive (test)> select username, age, sex, ids1 from jsont1 > lateral view explode(yanqi_json_array(json, "ids")) t1 as ids1;OKusername age sex ids1user1 18 male 101user1 18 male 102user1 18 male 103user2 20 female 201user2 20 female 202user2 20 female 203user2 20 female 204user3 23 male 301user3 23 male 302user3 23 male 303user3 23 male 304user3 23 male 305user4 17 male 401user4 17 male 402user4 17 male 403user4 17 male 304user5 35 female 501user5 35 female 502user5 35 female 503

### --- 解析json串中的id和num~~~ # 语法~~~ 解析json串中的id、numselect username, age, sex, id, num from jsont1lateral view json_tuple(json, 'id', 'total_number') t1 as id, num;

~~~ # 操作实例hive (test)> select username, age, sex, id, num from jsont1 > lateral view json_tuple(json, 'id', 'total_number') t1 as id, num;OKusername age sex id numuser1 18 male 1 3user2 20 female 2 4user3 23 male 3 5user4 17 male 4 5user5 35 female 5 3

### --- 解析json串中的数组并展开~~~ # 语法~~~ 解析json串中的数组,并展开select username, age, sex, ids1, id, num from jsont1lateral view explode(yanqi_json_array(json, "ids")) t1 as ids1lateral view json_tuple(json, 'id', 'total_number') t1 as id, num;

~~~ # 操作实例hive (test)> select username, age, sex, ids1, id, num from jsont1 > lateral view explode(yanqi_json_array(json, "ids")) t1 as ids1 > lateral view json_tuple(json, 'id', 'total_number') t1 as id, num;OKusername age sex ids1 id numuser1 18 male 101 1 3user1 18 male 102 1 3user1 18 male 103 1 3user2 20 female 201 2 4user2 20 female 202 2 4user2 20 female 203 2 4user2 20 female 204 2 4user3 23 male 301 3 5user3 23 male 302 3 5user3 23 male 303 3 5user3 23 male 304 3 5user3 23 male 305 3 5user4 17 male 401 4 5user4 17 male 402 4 5user4 17 male 403 4 5user4 17 male 304 4 5user5 35 female 501 5 3user5 35 female 502 5 3user5 35 female 503 5 3

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:|NO.Z.00015|——————————|BigDataEnd|——|Hadoop&PB级数仓.V07|——|PB数仓.v07|会员活跃度分析|自定义拦截器实现&测试|
下一篇:USB 4协议带来了Type-C接口
相关文章

 发表评论

暂时没有评论,来抢沙发吧~