Metric differential privacy (DP) provides heterogeneous privacy guarantees based on a distance between the pair of inputs. It is a widely popular notion of privacy since it captures the natural privacy semantics for many applications (such as, for location data) and results in better utility than standard DP. However, prior work in metric DP has primarily focused on the \textit{item-level} setting where every user only reports a single data item. A more realistic setting is that of user-level DP where each user contributes multiple items and privacy is then desired at the granularity of the user's \textit{entire} contribution. In this paper, we initiate the study of metric DP at the user-level. Specifically, we use the earth-mover's distance ($d_\textsf{EM}$) as our metric to obtain a notion of privacy as it captures both the magnitude and spatial aspects of changes in a user's data. We make three main technical contributions. First, we design two novel mechanisms under $d_\textsf{EM}$-DP to answer linear queries and item-wise queries. Specifically, our analysis for the latter involves a generalization of the privacy amplification by shuffling result which may be of independent interest. Second, we provide a black-box reduction from the general unbounded to bounded $d_\textsf{EM}$-DP (size of the dataset is fixed and public) with a novel sampling based mechanism. Third, we show that our proposed mechanisms can provably provide improved utility over user-level DP, for certain types of linear queries and frequency estimation.
翻译:度量差异隐私基于输入对之间的距离提供异构隐私保证,因其能够捕捉诸多应用(如位置数据)的自然隐私语义且比标准差异隐私具有更优效用而广受欢迎。然而,现有度量差异隐私研究主要聚焦于每位用户仅提交单个数据项的\textit{项级}场景。更具现实意义的是用户级差异隐私场景:每位用户贡献多个数据项,隐私保护需以用户\textit{整体}贡献为粒度。本文首次研究用户级度量差异隐私。具体而言,我们采用推土机距离($d_\textsf{EM}$)作为隐私概念的度量标准,因其能同时捕捉用户数据变化中的幅度与空间特征。主要技术贡献包括三方面:第一,针对$d_\textsf{EM}$-DP设计两种新型机制,分别用于处理线性查询与逐项查询。其中,后者的分析涉及混洗隐私放大结果的泛化,该结果可能具有独立研究价值。第二,通过基于采样的新型机制,提出从一般无界范围到有界$d_\textsf{EM}$-DP(数据集规模固定且公开)的黑盒归约方法。第三,证明所提机制在特定线性查询与频率估计任务中,能显著优于用户级差异隐私的效用。