While machine learning has achieved remarkable results in a wide variety of domains, the training of models often requires large datasets that may need to be collected from different individuals. As sensitive information may be contained in the individual's dataset, sharing training data may lead to severe privacy concerns. Therefore, there is a compelling need to develop privacy-aware machine learning methods, for which one effective approach is to leverage the generic framework of differential privacy. Considering that stochastic gradient descent (SGD) is one of the most commonly adopted methods for large-scale machine learning problems, a decentralized differentially private SGD algorithm is proposed in this work. Particularly, we focus on SGD without replacement due to its favorable structure for practical implementation. Both privacy and convergence analysis are provided for the proposed algorithm. Finally, extensive experiments are performed to demonstrate the effectiveness of the proposed method.
翻译:尽管机器学习已在众多领域取得显著成果,但模型训练通常需要从不同个体收集大量数据集。由于个体数据集中可能包含敏感信息,共享训练数据会引发严重的隐私关切。因此,发展隐私感知的机器学习方法具有迫切需求,其中一种有效途径是利用差分隐私这一通用框架。考虑到随机梯度下降(SGD)是大规模机器学习问题中最常用的方法之一,本文提出了一种去中心化差分隐私SGD算法。特别地,鉴于其在实际实现中具有有利结构,我们专注于无放回SGD。我们为所提出的算法提供了隐私性与收敛性分析。最后,通过大量实验验证了该方法的有效性。