Stochastic Gradient Descent (SGD) with gradient clipping is a powerful technique for enabling differentially private optimization. Although prior works extensively investigated clipping with a constant threshold, private training remains highly sensitive to threshold selection, which can be expensive or even infeasible to tune. This sensitivity motivates the development of adaptive approaches, such as quantile clipping, which have demonstrated empirical success but lack a solid theoretical understanding. This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD). We demonstrate that QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but show how this can be mitigated through a carefully designed quantile and step size schedule. Our analysis reveals crucial relationships between quantile selection, step size, and convergence behavior, providing practical guidelines for parameter selection. We extend these results to differentially private optimization, establishing the first theoretical guarantees for DP-QC-SGD. Our findings provide theoretical foundations for widely used adaptive clipping heuristic and highlight open avenues for future research.
翻译:随机梯度下降(SGD)结合梯度剪裁是实现差分隐私优化的关键技术。尽管先前研究已深入探讨了固定阈值剪裁方法,但隐私训练过程仍对阈值选择高度敏感,而阈值的调优往往代价高昂甚至难以实现。这种敏感性推动了自适应方法(如分位数剪裁)的发展,这些方法虽在实践中取得成功,但缺乏坚实的理论基础。本文首次对采用分位数剪裁的SGD(QC-SGD)进行了全面的收敛性分析。我们证明QC-SGD存在与固定阈值剪裁SGD类似的偏差问题,但通过精心设计的分位数与步长调度方案可有效缓解该问题。我们的分析揭示了分位数选择、步长与收敛行为之间的关键关联,为参数选择提供了实用指导。我们将这些结果拓展至差分隐私优化领域,首次为DP-QC-SGD建立了理论保证。本研究为广泛使用的自适应剪裁启发式方法奠定了理论基础,并指明了未来研究的开放方向。