We study mean estimation for Gaussian distributions under \textit{personalized differential privacy} (PDP), where each record has its own privacy budget. PDP is commonly considered in two variants: \textit{bounded} and \textit{unbounded} PDP. In bounded PDP, the privacy budgets are public and neighboring datasets differ by replacing one record. In unbounded PDP, neighboring datasets differ by adding or removing a record; consequently, an algorithm must additionally protect participation information, making both the dataset size and the privacy profile sensitive. Existing works have only studied mean estimation over bounded distributions under bounded PDP. Different from mean estimation for distributions with bounded range, where each element can be treated equally and we only need to consider the privacy diversity of elements, the challenge for Gaussian is that, elements can have very different contributions due to the unbounded support. we need to jointly consider the privacy information and the data values. Such a problem becomes even more challenging under unbounded PDP, where the privacy information is protected and the way to compute the weights becomes unclear. In this paper, we address these challenges by proposing optimal Gaussian mean estimators under both bounded and unbounded PDP, where in each setting we first derive lower bounds for both problems, following PDP mean estimators with the algorithmic upper bounds matching the corresponding lower bounds up to logarithmic factors.
翻译:本文研究\textit{个性化差分隐私}(PDP)框架下的高斯分布均值估计问题,其中每条记录拥有独立的隐私预算。PDP通常存在两种变体:\textit{有界}与\textit{无界}PDP。在有界PDP中,隐私预算为公开参数,相邻数据集通过替换单条记录形成。在无界PDP中,相邻数据集通过增删记录形成;因此算法必须额外保护参与信息,使得数据集规模与隐私配置均成为敏感信息。现有研究仅探讨了有界PDP下有界分布的均值估计。与取值范围有界的分布(其元素可被平等对待,仅需考虑元素的隐私异质性)不同,高斯分布的挑战在于:由于支撑集无界,不同元素可能产生差异巨大的贡献,需要同时权衡隐私信息与数据值。该问题在无界PDP下更具挑战性——隐私信息受保护时,权重计算方式将变得不明确。本文通过提出有界与无界PDP下的最优高斯均值估计器应对这些挑战:针对每种设定,我们首先推导问题的下界,继而设计PDP均值估计器,使其算法上界在忽略对数因子后与对应下界匹配。