Differential Privacy (DP) is the current gold-standard for measuring privacy. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, two pillars of data analysis in the industry, subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both the problems in two different formulations -- the correlated and the uncorrelated setting. In the former setting, the privacy demand and the user data can be arbitrarily correlated while in the latter setting, there is no correlation between the dataset and the privacy demand. We prove some optimality results, under both PAC error and mean-squared error, for our proposed algorithms and demonstrate superior performance over other baseline techniques experimentally.
翻译:差分隐私(DP)是当前衡量隐私保护的黄金标准。现有文献中基于DP约束的估计问题主要集中于为所有用户提供同等隐私保护。本文针对工业数据分析的两大支柱——单变量数据的经验均值估计与分类数据的频率估计——在异构隐私约束下的问题展开研究。每位向数据集贡献样本的用户可具有不同的隐私需求。我们假设数据集本身处于最坏情况,并在两种不同设定下研究这两个问题:相关设定与无关设定。在前者中,隐私需求与用户数据可存在任意相关性;而在后者中,数据集与隐私需求之间不存在相关性。我们证明了所提出算法在PAC误差和均方误差准则下的最优性结果,并通过实验验证了其相对于其他基线技术的优越性能。