Early detection of chronic diseases is beneficial to healthcare by providing a golden opportunity for timely interventions. Although numerous prior studies have successfully used machine learning (ML) models for disease diagnoses, they highly rely on medical data, which are scarce for most patients in the early stage of the chronic diseases. In this paper, we aim to diagnose hyperglycemia (diabetes), hyperlipidemia, and hypertension (collectively known as 3H) using own collected behavioral data, thus, enable the early detection of 3H without using medical data collected in clinical settings. Specifically, we collected daily behavioral data from 629 participants over a 3-month study period, and trained various ML models after data preprocessing. Experimental results show that only using the participants' uploaded behavioral data, we can achieve accurate 3H diagnoses: 80.2\%, 71.3\%, and 81.2\% for diabetes, hyperlipidemia, and hypertension, respectively. Furthermore, we conduct Shapley analysis on the trained models to identify the most influential features for each type of diseases. The identified influential features are consistent with those reported in the literature.
翻译:慢性疾病的早期检测为及时干预提供了黄金窗口,对医疗保健具有重要价值。尽管已有大量研究成功应用机器学习模型进行疾病诊断,但这些方法高度依赖医疗数据,而慢性疾病早期阶段的大多数患者往往缺乏此类数据。本文旨在利用自主收集的行为数据诊断高血糖(糖尿病)、高血脂和高血压(统称为"三高"),从而在不依赖临床医疗数据的情况下实现"三高"的早期检测。具体而言,我们在为期三个月的研究周期内收集了629名参与者的日常行为数据,经过数据预处理后训练了多种机器学习模型。实验结果表明,仅使用参与者上传的行为数据即可实现准确的"三高"诊断:对糖尿病、高血脂和高血压的诊断准确率分别达到80.2%、71.3%和81.2%。此外,我们对训练好的模型进行Shapley分析,以识别对各类疾病最具影响力的特征。所识别出的关键特征与文献报道的结果具有一致性。