The concept of differential privacy (DP) can quantitatively measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. The DP, which is generally used as a constraint, has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. A common methodology for guaranteeing DP is incorporating appropriate noise into query outputs, thereby establishing statistical defense systems against privacy attacks such as membership inference and linkage attacks. However, especially for small datasets, existing DP mechanisms occasionally add excessive amount of noise to query output, thereby discarding data utility. This is because the traditional DP computes privacy loss based on the worst-case scenario, i.e., statistical outliers. In this work, to tackle this challenge, we utilize per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances. In a nutshell, we propose a per-instance noise variance optimization (NVO) game, framed as a common interest sequential game, and show that the Nash equilibrium (NE) points of it inherently guarantee pDP for all data instances. Through extensive experiments, our proposed pDP algorithm demonstrated an average performance improvement of up to 99.53% compared to the conventional DP algorithm in terms of KL divergence.
翻译:差分隐私(DP)概念通过观察目标数据集中因个体加入导致的分布变化,可定量衡量隐私损失。作为常用约束条件,DP在苹果、谷歌等业界巨头的机器学习数据保护中占据重要地位。保证DP的通用方法是在查询输出中注入适当噪声,以此建立针对成员推断攻击和链接攻击等隐私攻击的统计防御体系。然而,尤其对于小规模数据集,现有DP机制有时会向查询输出添加过多噪声,从而牺牲数据效用。这是因为传统DP基于最坏情况(即统计异常值)计算隐私损失。为解决这一挑战,本文采用实例级DP(pDP)作为约束,测量每个数据实例的隐私损失,并针对个体实例优化噪声。简言之,我们提出了一种实例级噪声方差优化(NVO)博弈,将其构建为共同利益序贯博弈,并证明其纳什均衡(NE)点本质上能保证所有数据实例的pDP。通过大量实验,我们提出的pDP算法在KL散度方面相较传统DP算法平均性能提升最高达99.53%。