Collaborative training of a machine learning model comes with a risk of sharing sensitive or private data. Federated learning offers a way of collectively training a single global model without the need to share client data, by sharing only the updated parameters from each client's local model. A central server is then used to aggregate parameters from all clients and redistribute the aggregated model back to the clients. Recent findings have shown that even in this scenario, private data can be reconstructed only using information about model parameters. Current efforts to mitigate this are mainly focused on reducing privacy risks on the server side, assuming that other clients will not act maliciously. In this work, we analyzed various methods for improving the privacy of client data concerning both the server and other clients for neural networks. Some of these methods include homomorphic encryption, gradient compression, gradient noising, and discussion on possible usage of modified federated learning systems such as split learning, swarm learning or fully encrypted models. We have analyzed the negative effects of gradient compression and gradient noising on the accuracy of convolutional neural networks used for classification. We have shown the difficulty of data reconstruction in the case of segmentation networks. We have also implemented a proof of concept on the NVIDIA Jetson TX2 module used in edge devices and simulated a federated learning process.
翻译:机器学习模型的协同训练伴随着共享敏感或私有数据的风险。联邦学习通过仅共享各客户端本地模型的更新参数,提供了一种无需共享客户端数据即可集体训练单一全局模型的方法。随后利用中央服务器聚合所有客户端的参数,并将聚合后的模型重新分发给客户端。近期研究表明,即使在此场景下,仅利用模型参数信息仍可能重构私有数据。当前缓解该问题的努力主要集中于降低服务器端的隐私风险,并假设其他客户端不会恶意行为。本研究分析了针对神经网络中客户端数据在服务器与其他客户端两方面的隐私增强方法。这些方法包括同态加密、梯度压缩、梯度噪声注入,以及对改进型联邦学习系统(如分割学习、群体学习或全加密模型)潜在应用的探讨。我们分析了梯度压缩与梯度噪声注入对用于分类的卷积神经网络准确性的负面影响,证明了分割网络场景下数据重构的困难性,并在边缘设备常用的NVIDIA Jetson TX2模块上实现了概念验证,模拟了联邦学习流程。