The healthcare industry generates enormous amounts of complex clinical data that make the prediction of disease detection a complicated process. In medical informatics, making effective and efficient decisions is very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely, J48, Random Forest, and Na\"ive Bayes to discover the accuracy of their performance. We also examine the impact of the feature selection method. A comparative and analysis study was performed to determine the best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity and specificity. The importance of using classification techniques for heart disease diagnosis has been highlighted. We also reduced the number of attributes in the dataset, which showed a significant improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart disease was Random Forest with an accuracy of 99.24%.
翻译:医疗行业产生了海量复杂的临床数据,使得疾病检测预测成为一个复杂的过程。在医学信息学中,做出有效且高效的决策至关重要。数据挖掘技术主要用于识别和提取隐藏模式及有趣知识,以诊断和预测医疗数据集中的疾病。当前,心脏病被认为是医疗领域最重要的问题之一。因此,早期诊断有助于降低死亡率。数据挖掘技术已被证明在预测和诊断心脏病方面非常有效。本研究利用心脏病医疗数据集中的分类算法,即J48、随机森林和朴素贝叶斯,来评估其性能准确性。我们还检验了特征选择方法的影响。通过使用怀卡托知识分析环境(Weka)软件(版本3.8.6)进行比较分析研究,以确定最佳技术。采用准确率、敏感性和特异性等标准指标评估所用算法的性能。强调了使用分类技术诊断心脏病的重要性。我们还减少了数据集中的属性数量,这显著提高了预测准确性。结果表明,预测心脏病的最佳算法是随机森林,准确率达到99.24%。