Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R\'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.
翻译:子采样放大是差分隐私机器学习中的核心基础技术之一:在随机批次而非完整数据集上训练模型能增强隐私保护。传统上,这一过程通过机制无关的子采样保证来形式化,即将子采样机制的隐私参数表示为原始机制隐私参数的函数。我们提出了首个推导机制特定保证的通用框架,该框架利用这些参数之外的额外信息,以更严格地表征子采样机制的隐私特性。此类保证对于隐私核算(即追踪多次迭代中的隐私损耗)尤为重要。总体而言,我们基于条件最优传输的框架,能够以统一且原则性的方式推导出近似差分隐私的现有及新型保证、基于Rényi差分隐私的核算方法,以及基于支配对的核算方法。作为应用,我们分析了子采样如何影响多用户群组的隐私保护。我们提出的严格机制特定边界优于严格的机制无关边界及经典群组隐私结果。