Smartwatch health sensor data is increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprises sensitive personal information and is resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress. Our method involves the generation of synthetic sequence data through Generative Adversarial Networks (GANs), coupled with the implementation of Differential Privacy (DP) safeguards for protecting patient information during model training. To ensure the integrity of our synthetic data, we employ a range of quality assessments and monitor the plausibility between synthetic and original data. To test the usefulness, we create private machine learning models on a commonly used, albeit small, stress detection dataset, exploring strategies for enhancing the existing data foundation with our synthetic data. Through our GAN-based augmentation methods, we observe improvements in model performance, both in non-private (0.45% F1) and private (11.90-15.48% F1) training scenarios. We underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with limited availability of real training samples.
翻译:智能手表健康传感器数据在智能健康应用和患者监测(包括压力检测)中日益得到广泛应用。然而,此类医疗数据通常包含敏感个人信息,且获取成本高昂,不便于研究使用。针对这一挑战,我们提出了一种面向压力时刻的多传感器智能手表健康读数的隐私感知合成方法。该方法通过生成对抗网络(GANs)生成合成序列数据,并在模型训练过程中结合差分隐私(DP)保障机制,以保护患者信息。为验证合成数据的完整性,我们采用了一系列质量评估方法,并监测合成数据与原始数据之间的合理性。在实用性测试中,我们基于一个常用但规模较小的压力检测数据集构建了隐私机器学习模型,探讨了利用合成数据增强现有数据基础的相关策略。通过基于GAN的数据增强方法,我们在非私有(F1提升0.45%)和私有(F1提升11.90-15.48%)训练场景中均观察到模型性能的提升。研究结果突显了差分隐私合成数据在优化效用-隐私权衡方面的潜力,尤其在真实训练样本有限的情况下。