Deep Learning: a Heuristic Three-stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-based Clinical Data

网格搜索 · cancer · MoDELS · Performer · 优化器 ·

2024 年 8 月 15 日

翻译：深度学习：一种基于电子健康记录临床数据的启发式三阶段网格搜索机制用于优化乳腺癌转移未来风险预测

Xia Jiang,Yijun Zhou,Chuhan Xu,Adam Brufsky,Alan Wells

A grid search, at the cost of training and testing a large number of models, is an effective way to optimize the prediction performance of deep learning models. A challenging task concerning grid search is the time management. Without a good time management scheme, a grid search can easily be set off as a mission that will not finish in our lifetime. In this study, we introduce a heuristic three-stage mechanism for managing the running time of low-budget grid searches, and the sweet-spot grid search (SSGS) and randomized grid search (RGS) strategies for improving model prediction performance, in predicting the 5-year, 10-year, and 15-year risk of breast cancer metastasis. We develop deep feedforward neural network (DFNN) models and optimize them through grid searches. We conduct eight cycles of grid searches by applying our three-stage mechanism and SSGS and RGS strategies. We conduct various SHAP analyses including unique ones that interpret the importance of the DFNN-model hyperparameters. Our results show that grid search can greatly improve model prediction. The grid searches we conducted improved the risk prediction of 5-year, 10-year, and 15-year breast cancer metastasis by 18.6%, 16.3%, and 17.3% respectively, over the average performance of all corresponding models we trained using the RGS strategy. We not only demonstrate best model performance but also characterize grid searches from various aspects such as their capabilities of discovering decent models and the unit grid search time. The three-stage mechanism worked effectively. It made our low-budget grid searches feasible and manageable, and in the meantime helped improve model prediction performance. Our SHAP analyses identified both clinical risk factors important for the prediction of future risk of breast cancer metastasis, and DFNN-model hyperparameters important to the prediction of performance scores.

翻译：网格搜索是一种优化深度学习模型预测性能的有效方法，但其代价是需要训练和测试大量模型。网格搜索面临的一个关键挑战是时间管理。若缺乏合理的时间管理方案，网格搜索极易演变为一项无法在有限时间内完成的任务。本研究提出了一种用于管理低预算网格搜索运行时间的启发式三阶段机制，并结合最优网格搜索（SSGS）与随机化网格搜索（RGS）策略，以提升乳腺癌转移5年、10年及15年风险预测的模型性能。我们开发了深度前馈神经网络（DFNN）模型，并通过网格搜索进行优化。应用三阶段机制及SSGS与RGS策略，我们执行了八轮网格搜索。我们进行了多种SHAP分析，包括阐释DFNN模型超参数重要性的独特分析。结果表明，网格搜索能显著提升模型预测性能。相较于采用RGS策略训练的所有对应模型的平均性能，我们所实施的网格搜索将乳腺癌转移5年、10年及15年风险预测分别提升了18.6%、16.3%和17.3%。我们不仅展示了最佳模型性能，还从多个维度（如发现优质模型的能力及单位网格搜索时间）对网格搜索进行了系统表征。三阶段机制运行高效，既使低预算网格搜索具备可行性与可管理性，同时助力提升了模型预测性能。SHAP分析结果既识别了对乳腺癌转移未来风险预测至关重要的临床风险因素，也揭示了与预测性能评分密切相关的DFNN模型超参数。