Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.
翻译:医疗诊断是医疗领域中的一项关键任务,其核心在于提供准确的分类及相应治疗方案。基于正确诊断做出近乎精确的决策直接影响患者的生命,若分类错误则可能导致严重后果。针对表格化医疗数据集,已有多种传统机器学习方法(如支持向量机与逻辑回归)及先进的表格深度学习模型(包括TabNet与TabTransformer)被提出并应用。此外,由于集成方法在多项任务中展现出卓越性能、较低计算成本及更易优化的特点,近年来在医疗领域得到更多应用。这些方法为多种诊断任务中的医疗决策过程提供了强有力的替代方案。本研究探讨了集成方法(特别是梯度提升决策树算法)在基于表格数据的医疗分类任务中的优势,重点关注XGBoost、CatBoost和LightGBM三种算法。实验表明,在多个基准表格医疗诊断数据集上,GBDT方法不仅超越了传统机器学习与深度神经网络架构,且平均排名最高。此外,与深度学习模型相比,GBDT所需计算资源显著减少,从而在高性能与低复杂度之间建立了最优方法论。