Current state-of-the-art methods for differentially private model training are based on matrix factorization techniques. However, these methods suffer from high computational overhead because they require numerically solving a demanding optimization problem to determine an approximately optimal factorization prior to the actual model training. In this work, we present a new matrix factorization approach, BSR, which overcomes this computational bottleneck. By exploiting properties of the standard matrix square root, BSR allows to efficiently handle also large-scale problems. For the key scenario of stochastic gradient descent with momentum and weight decay, we even derive analytical expressions for BSR that render the computational overhead negligible. We prove bounds on the approximation quality that hold both in the centralized and in the federated learning setting. Our numerical experiments demonstrate that models trained using BSR perform on par with the best existing methods, while completely avoiding their computational overhead.
翻译:当前差分隐私模型训练的最先进方法基于矩阵分解技术。然而,这些方法存在较高的计算开销,因为它们需要在模型训练前通过数值求解复杂的优化问题来确定近似最优的分解。本工作提出了一种新的矩阵分解方法BSR,该方法克服了这一计算瓶颈。通过利用标准矩阵平方根的性质,BSR能够高效处理大规模问题。对于带动量和权重衰减的随机梯度下降这一关键场景,我们甚至推导出了BSR的解析表达式,使得计算开销可忽略不计。我们证明了在集中式学习和联邦学习场景下均成立的近似质量界限。数值实验表明,使用BSR训练的模型性能与现有最佳方法相当,同时完全避免了它们的计算开销。