Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to protect sensitive data during the training of machine learning models, but its privacy guarantee often comes at a large cost of model performance due to the lack of tight theoretical bounds quantifying privacy loss. While recent efforts have achieved more accurate privacy guarantees, they still impose some assumptions prohibited from practical applications, such as convexity and complex parameter requirements, and rarely investigate in-depth the impact of privacy mechanisms on the model's utility. In this paper, we provide a rigorous privacy characterization for DPSGD with general L-smooth and non-convex loss functions, revealing converged privacy loss with iteration in bounded-domain cases. Specifically, we track the privacy loss over multiple iterations, leveraging the noisy smooth-reduction property, and further establish comprehensive convergence analysis in different scenarios. In particular, we show that for DPSGD with a bounded domain, (i) the privacy loss can still converge without the convexity assumption, (ii) a smaller bounded diameter can improve both privacy and utility simultaneously under certain conditions, and (iii) the attainable big-O order of the privacy utility trade-off for DPSGD with gradient clipping (DPSGD-GC) and for DPSGD-GC with bounded domain (DPSGD-DC) and mu-strongly convex population risk function, respectively. Experiments via membership inference attack (MIA) in a practical setting validate insights gained from the theoretical results.
翻译:差分隐私随机梯度下降(DPSGD)被广泛用于在机器学习模型训练过程中保护敏感数据,但由于缺乏量化隐私损失的严格理论界限,其隐私保证往往以模型性能的大幅下降为代价。尽管近期研究已实现了更精确的隐私保证,但这些研究仍存在一些实际应用中难以满足的假设,例如凸性与复杂的参数要求,且很少深入探究隐私机制对模型效用的影响。本文针对具有一般L-光滑且非凸损失函数的DPSGD,提供了严格的隐私特性刻画,揭示了有界域情形下隐私损失随迭代收敛的现象。具体而言,我们利用噪声光滑约减特性追踪多次迭代中的隐私损失,并进一步建立了不同场景下的全面收敛性分析。特别地,我们证明对于有界域DPSGD:(i)即使没有凸性假设,隐私损失仍可收敛;(ii)在特定条件下,较小的有界直径能同时提升隐私性与效用;(iii)分别给出了带梯度裁剪的DPSGD(DPSGD-GC)以及带梯度裁剪与有界域的DPSGD(DPSGD-DC)在μ-强凸总体风险函数下可达到的隐私-效用权衡的大O阶。通过实际场景中的成员推理攻击(MIA)实验验证了理论结果所揭示的洞见。