The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming traditional statistical methods that struggle to cope with individual patterns. However, ML requires a substantial amount of data for training, which is challenging due to the presence of personal identifiable information (PII) and the fragmentation posed by regulations. In this paper, we present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data. We employ collaborative training by considering individuals and organizations under FML, which models both cross-device and cross-silo learning scenarios. Our approach is evaluated on a real-world dataset with non-independent and identically distributed (non-iid) data among clients, class imbalance and label ambiguity. Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML, demonstrating comparable or superior predictive performance than traditional ML models.
翻译:社交护理应用的部署对于改善老年人生活质量至关重要,并能使服务提供者实施早期干预。精准预测健康老龄化应用中的用户流失现象具有重要意义,因为这直接关联到个体的健康状况。机器学习算法已实现高精度预测,其性能超越难以应对个体差异性的传统统计方法。然而,机器学习需要大量训练数据,而个人身份信息的存在及法规造成的碎片化问题给数据收集带来挑战。本文提出一种联邦机器学习方法,该方法通过无需传输个体数据即可实现分布式训练,最大程度降低隐私风险。我们利用联邦学习框架下的个体与组织协同训练机制,建模跨设备与跨孤岛两种学习场景。基于真实世界数据集的评估显示,该数据集存在客户端间非独立同分布数据、类别不平衡及标签歧义问题。实验结果表明,数据选择与类别不平衡处理技术能显著提升联邦学习模型的预测精度,其性能与传统机器学习模型相比具有可比性或更优表现。