Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees. It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users' data. Each user reports an automatic random alteration of their true value to the analytics server, which then estimates the histogram of the true unseen values of all users using a debiasing rule to compensate for the added randomness. A known issue is that the standard debiasing rule can yield a vector with negative values (which can not be interpreted as a histogram), and there is no consensus on the best fix. An elegant but slow solution is the Iterative Bayesian Update algorithm (IBU), which converges to the Maximum Likelihood Estimate (MLE) as the number of iterations goes to infinity. This paper bypasses IBU by providing a simple formula for the exact MLE of RR and compares it with other estimation methods experimentally to help practitioners decide which one to use.
翻译:随机响应是一种旨在收集和分析具有本地差分隐私保证的分类数据的协议。它已被用作大型科技公司部署的机制的基础构件,用于收集应用程序或网络用户的数据。每个用户向分析服务器报告其真实值的自动随机扰动,然后服务器使用去偏规则来补偿添加的随机性,从而估计所有用户真实未观测值的直方图。一个已知的问题是标准去偏规则可能产生具有负值的向量(这不能解释为直方图),并且对于最佳修复方法尚无共识。一种优雅但缓慢的解决方案是迭代贝叶斯更新算法,该算法随着迭代次数趋于无穷而收敛到最大似然估计。本文通过为随机响应的精确最大似然估计提供一个简单公式来绕过迭代贝叶斯更新算法,并通过实验将其与其他估计方法进行比较,以帮助从业者决定使用哪种方法。