This paper presents a comparative study of sampling methods within the FedHome framework, designed for personalized in-home health monitoring. FedHome leverages federated learning (FL) and generative convolutional autoencoders (GCAE) to train models on decentralized edge devices while prioritizing data privacy. A notable challenge in this domain is the class imbalance in health data, where critical events such as falls are underrepresented, adversely affecting model performance. To address this, the research evaluates six oversampling techniques using Stratified K-fold cross-validation: SMOTE, Borderline-SMOTE, Random OverSampler, SMOTE-Tomek, SVM-SMOTE, and SMOTE-ENN. These methods are tested on FedHome's public implementation over 200 training rounds with and without stratified K-fold cross-validation. The findings indicate that SMOTE-ENN achieves the most consistent test accuracy, with a standard deviation range of 0.0167-0.0176, demonstrating stable performance compared to other samplers. In contrast, SMOTE and SVM-SMOTE exhibit higher variability in performance, as reflected by their wider standard deviation ranges of 0.0157-0.0180 and 0.0155-0.0180, respectively. Similarly, the Random OverSampler method shows a significant deviation range of 0.0155-0.0176. SMOTE-Tomek, with a deviation range of 0.0160-0.0175, also shows greater stability but not as much as SMOTE-ENN. This finding highlights the potential of SMOTE-ENN to enhance the reliability and accuracy of personalized health monitoring systems within the FedHome framework.
翻译:本文在面向个性化居家健康监测的FedHome框架内,对采样方法进行了比较研究。FedHome利用联邦学习(FL)和生成式卷积自编码器(GCAE)在分散的边缘设备上训练模型,同时优先保障数据隐私。该领域的一个显著挑战是健康数据中的类别不平衡问题,其中关键事件(如跌倒)样本不足,对模型性能产生不利影响。为解决此问题,本研究采用分层K折交叉验证评估了六种过采样技术:SMOTE、Borderline-SMOTE、Random OverSampler、SMOTE-Tomek、SVM-SMOTE和SMOTE-ENN。这些方法在FedHome的公开实现上进行了测试,训练轮次为200轮,分别在使用和不使用分层K折交叉验证的条件下进行。结果表明,SMOTE-ENN取得了最一致的测试准确率,其标准差范围为0.0167-0.0176,相较于其他采样器表现出更稳定的性能。相比之下,SMOTE和SVM-SMOTE表现出更高的性能波动性,其标准差范围分别为0.0157-0.0180和0.0155-0.0180。同样,Random OverSampler方法也显示出较大的偏差范围(0.0155-0.0176)。SMOTE-Tomek的标准差范围为0.0160-0.0175,虽然也表现出较好的稳定性,但不及SMOTE-ENN。这一发现凸显了SMOTE-ENN在提升FedHome框架内个性化健康监测系统可靠性与准确性方面的潜力。