Radiomics is an emerging area of medical imaging data analysis particularly for cancer. It involves the conversion of digital medical images into mineable ultra-high dimensional data. Machine learning algorithms are widely used in radiomics data analysis to develop powerful decision support model to improve precision in diagnosis, assessment of prognosis and prediction of therapy response. However, machine learning algorithms for causal inference have not been previously employed in radiomics analysis. In this paper, we evaluate the value of machine learning algorithms for causal inference in radiomics. We select three recent competitive variable selection algorithms for causal inference: outcome-adaptive lasso (OAL), generalized outcome-adaptive lasso (GOAL) and causal ball screening (CBS). We used a sure independence screening procedure to propose an extension of GOAL and OAL for ultra-high dimensional data, SIS + GOAL and SIS + OAL. We compared SIS + GOAL, SIS + OAL and CBS using simulation study and two radiomics datasets in cancer, osteosarcoma and gliosarcoma. The two radiomics studies and the simulation study identified SIS + GOAL as the optimal variable selection algorithm.
翻译:放射组学是医学影像数据分析中一个新兴领域,尤其适用于癌症研究。它将数字医学影像转化为可挖掘的超高维数据。机器学习算法广泛应用于放射组学数据分析中,以构建强大的决策支持模型,从而提高诊断精度、预后评估及治疗反应预测的能力。然而,用于因果推断的机器学习算法此前尚未被应用于放射组学分析。本文评估了机器学习算法在放射组学因果推断中的价值。我们选取了三种近期用于因果推断的竞争性变量选择算法:结果自适应LASSO(OAL)、广义结果自适应LASSO(GOAL)和因果球筛选(CBS)。我们利用确定独立筛选过程,提出了GOAL和OAL在超高维数据上的扩展方法,即SIS+GOAL和SIS+OAL。通过模拟研究以及两个癌症(骨肉瘤和胶质肉瘤)放射组学数据集,我们对SIS+GOAL、SIS+OAL和CBS进行了比较。两项放射组学研究和模拟研究均将SIS+GOAL确定为最优变量选择算法。