Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R\'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.
翻译:子采样放大是差分隐私机器学习中的核心基础技术之一:在随机批次而非完整数据集上训练模型可增强隐私保护。传统上,这一过程通过机制无关的子采样保证形式化,将子采样机制的隐私参数表达为原始机制隐私参数的函数。我们提出了首个推导机制特定保证的通用框架,该框架利用超出原始隐私参数的额外信息,更精确地刻画子采样机制的隐私特性。此类保证对于隐私核算(即追踪多次迭代中的隐私消耗)尤为重要。总体而言,我们基于条件最优传输的框架,能够以统一且原则性的方式推导近似差分隐私的现有及新型保证、基于R\'enyi差分隐私的核算方法,以及基于支配对的核算方法。作为应用案例,我们分析了子采样如何影响多用户群组的隐私保护。我们提出的紧致机制特定边界优于紧致的机制无关边界及经典群组隐私结果。