基于机器学习的子痫前期预测模型构建

陈梓威; 陈治任; 黄泽花; 曹岩; 王涵; 王培安

doi:10.3969/j.issn.2096-3882.2023.08.005

基于机器学习的子痫前期预测模型构建

Construction of a preeclampsia prediction model based on machine learning

摘要

摘要: 目的采用CatBoost算法和逻辑回归(LR)算法构建子痫前期预测模型,以期为子痫前期高风险人群的早期防治提供参考。方法选取2012年1月—2021年12月于徐州市中心医院进行产检及分娩的孕产妇1 325例作为研究对象,其中研究组为患有子痫前期的孕产妇461例,对照组为随机抽取的同期正常妊娠的孕产妇864例。收集孕产妇住院期间的体格检查、人口学特征以及血常规、尿常规和生化指标等资料进行回顾性分析,通过统计学分析筛选导致子痫前期发生的独立影响因素。纳入独立影响因素,通过网格搜索法寻找LR算法和CatBoost算法的最优参数构建预测模型,并对模型进行预测效果评价。结果当C=100,penalty="l2",solver="liblinear"时,LR模型达到最佳效果,AUC=0.976 9,准确度=0.944 7,精确度=0.959 0,召回率=0.873 1,F1=0.914 1。当depth=5,iterations=500,l2_leaf_reg=1,learning_rate=0.1,rsm=0.5时,CatBoost模型达到最佳效果,其AUC=0.983 0,准确度=0.952 3,精确度=0.967 5,召回率=0.888 1,F1=0.926 1。结论 2种风险预测模型在预测性能上都有较好的表现,可以有效预测子痫前期的发生,有早期识别子痫前期的潜在应用价值。

Abstract: Objective To construct a preeclampsia predictive model based on CatBoost algorithm and logistic regression (LR) algorithm, in order to provide reference for the early prevention and treatment of people at high risk of preeclampsia.Methods A total of 1 325 pregnant women who were examined and delivered in Xuzhou Central Hospital from January 2012 to December 2021 were selected, including 461 preeclampsia women (a research group) and 864 normal pregnant women (a control group). Their general physical examination data, demographic characteristics, the results of blood routine test, urine routine test and biochemical indexes were collected for retrospective analysis. Through statistical analysis, the independent influencing factors for the development of preeclampsia were screened out. Then, a prediction model was constructed using the optimal parameters of CatBoost algorithm and LR algorithm by grid search, and the prediction effect of the model was evaluated.Results When C=100, penalty="l2", solver="liblinear", the LR model achieved the optimal effect, AUC=0.976 9, accuracy=0.944 7, precision=0.959 0, recall=0.873 1, and F1=0.914 1. When depth=5, iterations=500, l2_leaf_reg=1, learning_rate=0.1, and rsm=0.5, the CatBoost model achieved the optimal effect, AUC=0.983 0, accuracy=0.952 3, precision=0.967 5, recall=0.888 1, and F1=0.926 1.Conclusions These two risk prediction models have good performance in predictive performance, which can effectively predict the occurrence of preeclampsia, and provide potential application value for the early identification of preeclampsia in clinical practice.

HTML全文

参考文献(16)

施引文献

资源附件(0)