Determining the most informative features for predicting the overall survival of patients diagnosed with high-grade gastroenteropancreatic neuroendocrine neoplasms is crucial to improve individual treatment plans for patients, as well as the biological understanding of the disease. Recently developed ensemble feature selectors like the Repeated Elastic Net Technique for Feature Selection (RENT) and the User-Guided Bayesian Framework for Feature Selection (UBayFS) allow the user to identify such features in datasets with low sample sizes. While RENT is purely data-driven, UBayFS is capable of integrating expert knowledge a priori in the feature selection process. In this work we compare both feature selectors on a dataset comprising of 63 patients and 134 features from multiple sources, including basic patient characteristics, baseline blood values, tumor histology, imaging, and treatment information. Our experiments involve data-driven and expert-driven setups, as well as combinations of both. We use findings from clinical literature as a source of expert knowledge. Our results demonstrate that both feature selectors allow accurate predictions, and that expert knowledge has a stabilizing effect on the feature set, while the impact on predictive performance is limited. The features WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are the most stable and predictive features in our study.
翻译:确定预测高级别胃肠胰神经内分泌肿瘤患者总生存期的最具信息量特征,对于改善个体化治疗方案及加深对该疾病的生物学认知至关重要。近期开发的集成特征选择方法,如用于特征选择的重复弹性网络技术(RENT)和用户引导的贝叶斯特征选择框架(UBayFS),使得用户能够在低样本量数据集中识别此类特征。RENT纯粹基于数据驱动,而UBayFS能够在特征选择过程中先验地整合专家知识。本研究在包含63名患者及来自多源(包括基础患者特征、基线血液指标、肿瘤组织学、影像学及治疗信息)的134个特征的数据集上,对两种特征选择方法进行了比较。实验涉及数据驱动、专家驱动以及两者结合的场景。我们以临床文献中的发现作为专家知识来源。结果表明,两种特征选择方法均能实现准确预测,且专家知识对特征集具有稳定作用,但对预测性能的影响有限。本研究中,WHO体能状态评分、白蛋白、血小板、Ki-67、肿瘤形态、总肿瘤代谢体积、总病灶糖酵解量及最大标准摄取值是最稳定且最具预测力的特征。