How to properly set the privacy parameter in differential privacy (DP) has been an open question in DP research since it was first proposed in 2006. In this work, we demonstrate the ability of influence functions to offer insight into how a specific privacy parameter value will affect a model's test loss in the randomized response-based local DP setting. Our proposed method allows a data curator to select the privacy parameter best aligned with their allowed privacy-utility trade-off without requiring heavy computation such as extensive model retraining and data privatization. We consider multiple common randomization scenarios, such as performing randomized response over the features, and/or over the labels, as well as the more complex case of applying a class-dependent label noise correction method to offset the noise incurred by randomization. Further, we provide a detailed discussion over the computational complexity of our proposed approach inclusive of an empirical analysis. Through empirical evaluations we show that for both binary and multi-class settings, influence functions are able to approximate the true change in test loss that occurs when randomized response is applied over features and/or labels with small mean absolute error, especially in cases where noise correction methods are applied.
翻译:如何合理设置差分隐私(DP)中的隐私参数,自2006年提出以来一直是DP研究中的一个开放问题。在本工作中,我们展示了影响函数能够揭示在基于随机响应的本地差分隐私设置下,特定隐私参数值将如何影响模型测试损失的能力。我们提出的方法允许数据管理者在不需大量计算(如广泛的模型重新训练和数据私有化)的情况下,选择最符合其允许的隐私-效用权衡的隐私参数。我们考虑了多种常见的随机化场景,例如对特征和/或标签执行随机响应,以及更复杂的应用类依赖标签噪声校正方法来抵消随机化带来的噪声的情况。此外,我们详细讨论了所提方法的计算复杂度,包括实证分析。通过实证评估,我们证明在二分类和多分类设置中,影响函数能够以较小的平均绝对误差近似描述对特征和/或标签应用随机响应时测试损失的真实变化,特别是在应用噪声校正方法的情况下。