基于范围限制的合成数据隐私增强方法 (Privacy Amplification for Synthetic data using Range Restriction)

We introduce a new class of range restricted formal data privacy standards that condition on owner beliefs about sensitive data ranges. By incorporating this additional information, we can provide a stronger privacy guarantee (e.g. an amplification). The range restricted formal privacy standards protect only a subset (or ball) of data values and exclude ranges (or balls) believed to be already publicly known. The privacy standards are designed for the risk-weighted pseudo posterior (model) mechanism (PPM) used to generate synthetic data under an asymptotic Differential (aDP) privacy guarantee. The PPM downweights the likelihood contribution for each record proportionally to its disclosure risk. The PPM is adapted under inclusion of beliefs by adjusting the risk-weighted pseudo likelihood. We introduce two alternative adjustments. The first expresses data owner knowledge of the sensitive range as a probability, $λ$, that a datum value drawn from the underlying generating distribution lies outside the ball or subspace of values that are sensitive. The portion of each datum likelihood contribution deemed sensitive is then $(1-λ) \leq 1$ and is the only portion of the likelihood subject to risk down-weighting. The second adjustment encodes knowledge as the difference in probability masses $P(R) \leq 1$ between the edges of the sensitive range, $R$. We use the resulting conditional (pseudo) likelihood for a sensitive record, which boosts its worst case tail values away from 0. We compare privacy and utility properties for the PPM under the aDP and range restricted privacy standards.

翻译：我们引入了一类新的范围限制形式化数据隐私标准，该标准以数据所有者对敏感数据范围的先验信念为条件。通过融入这一额外信息，我们能够提供更强的隐私保障（例如隐私放大效应）。该范围限制隐私标准仅保护数据值的一个子集（或球形区域），同时排除已被认为公开已知的范围（或球形区域）。该隐私标准专为风险加权伪后验（模型）机制设计，该机制用于在渐近差分隐私保证下生成合成数据。PPM根据每条记录的披露风险按比例降低其似然贡献度。通过调整风险加权伪似然函数，PPM能够适应先验信念的融入。我们提出两种调整方案：第一种将数据所有者对敏感范围的认知量化为概率$λ$，表示从底层生成分布抽取的数据值落在敏感值球体或子空间之外的概率。此时每条数据似然贡献中被视为敏感的部分为$(1-λ) \leq 1$，这也是唯一需要进行风险降权处理的似然部分。第二种调整将知识编码为敏感范围$R$边界上的概率质量差$P(R) \leq 1$。我们利用由此得到的敏感记录条件（伪）似然函数，将其最坏情况下的尾部值从0提升。我们比较了PPM在aDP与范围限制隐私标准下的隐私保护效能与数据效用特性。