Assessing the pre-operative risk of lymph node metastases in endometrial cancer patients is a complex and challenging task. In principle, machine learning and deep learning models are flexible and expressive enough to capture the dynamics of clinical risk assessment. However, in this setting we are limited to observational data with quality issues, missing values, small sample size and high dimensionality: we cannot reliably learn such models from limited observational data with these sources of bias. Instead, we choose to learn a causal Bayesian network to mitigate the issues above and to leverage the prior knowledge on endometrial cancer available from clinicians and physicians. We introduce a causal discovery algorithm for causal Bayesian networks based on bootstrap resampling, as opposed to the single imputation used in related works. Moreover, we include a context variable to evaluate whether selection bias results in learning spurious associations. Finally, we discuss the strengths and limitations of our findings in light of the presence of missing data that may be missing-not-at-random, which is common in real-world clinical settings.
翻译:评估子宫内膜癌患者术前淋巴结转移风险是一项复杂且具有挑战性的任务。理论上,机器学习和深度学习模型具备足够的灵活性和表达能力来捕捉临床风险评估的动态特征。然而,在此情境下,我们面临存在质量问题的观测数据、缺失值、小样本量和高维度的限制:无法从存在这些偏倚来源的有限观测数据中可靠地学习此类模型。因此,我们选择学习因果贝叶斯网络以缓解上述问题,并利用临床医生和医师提供的子宫内膜癌先验知识。我们提出了一种基于自助重采样的因果贝叶斯网络因果发现算法,而非相关工作中使用的单一插补方法。此外,我们引入一个上下文变量,以评估选择偏倚是否会导致学习到虚假关联。最后,我们结合临床现实中常见的非随机缺失数据现象,讨论了研究发现的优势与局限性。