Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.
翻译:隐私噪声可能会抵消在差分隐私模型训练中使用自适应优化器的优势。以往工作通常通过使用辅助信息(如公共数据)来增强自适应优化的有效性。在本工作中,我们探索了在无辅助数据情况下,于隐私自适应优化中估计并高效适应梯度几何结构的技术。基于自适应方法能够容忍陈旧预条件器的观察,我们提出了一种带有延迟预条件器的差分隐私自适应训练方法(DP²),这是一种通过构建延迟但噪声更小的预条件器来更好实现自适应优势的简单方法。理论上,我们为凸优化和非凸优化问题提供了收敛性保证,并分析了延迟与隐私噪声降低之间的权衡。实验上,我们在多个真实数据集上探索了DP²,证明其相比非自适应基线方法可将收敛速度提升高达4倍,并能匹敌需要辅助数据的最先进优化方法的性能。