c语言sscanf函数的用法是什么
271
2022-09-21
sparkml_实战全流程_LogisticRegression(二)
交叉验证 网格搜索 参考: pyspark.ml.tuning as tune# 超参调优:grid search和train-validation splitting # 网格搜索import pyspark.ml.tuning as tunelogistic = cl.LogisticRegression(labelCol='INFANT_ALIVE_AT_REPORT')grid = tune.ParamGridBuilder()\ .addGrid(logistic.maxIter, [5,10,50])\ .addGrid(logistic.regParam, [0.01,0.05,0.3])\ .build()# 找出模型之间比较的方法evaluator = ev.BinaryClassificationEvaluator( rawPredictionCol='probability', labelCol='INFANT_ALIVE_AT_REPORT')# 使用K-Fold交叉验证评估各种参数的模型cv = tune.CrossValidator( estimator=logistic, estimatorParamMaps=grid, evaluator=evaluator, numFolds=3)# 我们不能直接使用数据,所以我们# 创建一个构建特征的pipelinepipeline = Pipeline(stages=[encoder, featuresCreator])birth_train, birth_test = births.randomSplit([0.7,0.3],seed=123) # 重新打开数据进行处理data_transformer = pipeline.fit(birth_train)data_test = data_transformer.transform(birth_test)# cvModel 返回估计的最佳模型 # 寻找模型最佳参数组合cvModel = cv.fit(data_transformer.transform(birth_train))results = cvModel.transform(data_test)# 查看效果print(evaluator.evaluate(results, {evaluator.metricName:'areaUnderROC'}))print(evaluator.evaluate(results, {evaluator.metricName:'areaUnderPR'}))0.7358488840349150.6959036715961695# 使用下面的代码可以查看模型最佳参数:# 查看最佳模型参数results = [ ( [ {key.name: paramValue} for key, paramValue in zip( params.keys(), params.values()) ], metric ) for params, metric in zip( cvModel.getEstimatorParamMaps(), cvModel.avgMetrics )]sorted(results, key=lambda el: el[1], reverse=True)[0]# 或者param_maps = cvModel.getEstimatorParamMaps()eval_metrics = cvModel.avgMetricsparam_res = []for params, metric in zip(param_maps, eval_metrics): param_metric = {} for key, param_val in zip(params.keys(), params.values()): param_metric[key.name]=param_val param_res.append((param_metric, metric))sorted(param_res, key=lambda x:x[1], reverse=True)[({'maxIter': 50, 'regParam': 0.01}, 0.7406291618177623), ({'maxIter': 10, 'regParam': 0.01}, 0.735580969909943), ({'maxIter': 50, 'regParam': 0.05}, 0.7355100622938429), ({'maxIter': 10, 'regParam': 0.05}, 0.7351586303619441), ({'maxIter': 10, 'regParam': 0.3}, 0.7248698034708339), ({'maxIter': 50, 'regParam': 0.3}, 0.7214679272915997), ({'maxIter': 5, 'regParam': 0.3}, 0.7180255703028883), ({'maxIter': 5, 'regParam': 0.01}, 0.7179304617840288), ({'maxIter': 5, 'regParam': 0.05}, 0.7173397593133481)]
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~