Federated learning (FL) is increasingly becoming the default approach for training machine learning models across decentralized Internet-of-Things (IoT) devices. A key advantage of FL is that no raw data are communicated across the network, providing an immediate layer of privacy. Despite this, recent works have demonstrated that data reconstruction can be done with the locally trained model updates which are communicated across the network. However, many of these works have limitations with regard to how the gradients are computed in backpropagation. In this work, we demonstrate that the model weights shared in FL can expose revealing information about the local data distributions of IoT devices. This leakage could expose sensitive information to malicious actors in a distributed system. We further discuss results which show that injecting noise into model weights is ineffective at preventing data leakage without seriously harming the global model accuracy.
翻译:联邦学习正日益成为在分散式物联网设备上训练机器学习模型的默认方法。联邦学习的一个关键优势在于,原始数据无需通过网络传输,从而提供了即时的隐私保护层。尽管如此,近期研究表明,利用在网络上通信的本地训练模型更新可以实现数据重建。然而,许多此类研究在反向传播过程中梯度计算方式上存在局限性。本文证明,联邦学习中共享的模型权重可能泄露物联网设备本地数据分布的相关信息。这种泄露可能使分布式系统中的恶意行为者获取敏感信息。我们进一步讨论相关结果表明,向模型权重注入噪声在无法严重损害全局模型准确性的情况下,无法有效防止数据泄露。