Mutual Information Assisted Ensemble Recommender System for Identifying Critical Risk Factors in Healthcare Prognosis

Purpose: Health recommenders act as important decision support systems, aiding patients and medical professionals in taking actions that lead to patients' well-being. These systems extract the information which may be of particular relevance to the end-user, helping them in making appropriate decisions. The present study proposes a feature recommender, as a part of a disease management system, that identifies and recommends the most important risk factors for an illness. Methods: A novel mutual information and ensemble-based feature ranking approach for identifying critical risk factors in healthcare prognosis is proposed. Results: To establish the effectiveness of the proposed method, experiments have been conducted on four benchmark datasets of diverse diseases (clear cell renal cell carcinoma (ccRCC), chronic kidney disease, Indian liver patient, and cervical cancer risk factors). The performance of the proposed recommender is compared with four state-of-the-art methods using recommender systems' performance metrics like average precision@K, precision@K, recall@K, F1@K, reciprocal rank@K. The method is able to recommend all relevant critical risk factors for ccRCC. It also attains a higher accuracy (96.6% and 98.6% using support vector machine and neural network, respectively) for ccRCC staging with a reduced feature set as compared to existing methods. Moreover, the top two features recommended using the proposed method with ccRCC, viz. size of tumor and metastasis status, are medically validated from the existing TNM system. Results are also found to be superior for the other three datasets. Conclusion: The proposed recommender can identify and recommend risk factors that have the most discriminating power for detecting diseases.

翻译：目的：健康推荐系统作为重要的决策支持工具，能够帮助患者和医疗专业人员采取有益于患者健康的行动。这些系统提取对终端用户具有特定关联性的信息，辅助其做出恰当决策。本研究提出一种作为疾病管理系统组成部分的特征推荐器，用于识别并推荐疾病的最关键风险因素。方法：提出一种新颖的基于互信息与集成学习的特征排序方法，用于识别医疗预后中的关键风险因素。结果：为验证所提方法的有效性，在四种不同疾病的基准数据集（透明细胞肾细胞癌（ccRCC）、慢性肾病、印度肝病患者及宫颈癌风险因素）上进行了实验。使用推荐系统性能指标（如平均精度@K、精度@K、召回率@K、F1@K、倒数排名@K）将所提推荐器与四种先进方法进行比较。该方法能够推荐ccRCC所有相关的关键风险因素。与现有方法相比，使用缩减特征集进行ccRCC分期时获得了更高准确率（分别使用支持向量机和神经网络达到96.6%和98.6%）。此外，所提方法针对ccRCC推荐的前两个特征——肿瘤大小和转移状态，已通过现有TNM系统获得医学验证。在其他三个数据集上也取得了更优结果。结论：所提推荐器能够识别并推荐对疾病检测最具区分度的风险因素。