We study the differentially private top-$k$ selection problem, aiming to identify a sequence of $k$ items with approximately the highest scores from $d$ items. Recent work by Gillenwater et al. (ICML '22) employs a direct sampling approach from the vast collection of $d^{\,\Theta(k)}$ possible length-$k$ sequences, showing superior empirical accuracy compared to previous pure or approximate differentially private methods. Their algorithm has a time and space complexity of $\tilde{O}(dk)$. In this paper, we present an improved algorithm with time and space complexity $O(d + k^2 / \epsilon \cdot \ln d)$, where $\epsilon$ denotes the privacy parameter. Experimental results show that our algorithm runs orders of magnitude faster than their approach, while achieving similar empirical accuracy.
翻译:我们研究差分隐私的Top-$k$选择问题,目标是从$d$个项目中识别出具有近似最高分数的$k$个项目序列。Gillenwater等人(ICML '22)的最新工作采用了一种直接采样方法,从包含$d^{\,\Theta(k)}$个可能长度为$k$序列的庞大集合中进行采样,与先前的纯差分隐私或近似差分隐私方法相比,显示出更优的实证准确性。他们的算法具有$\tilde{O}(dk)$的时间与空间复杂度。在本文中,我们提出了一种改进算法,其时间与空间复杂度为$O(d + k^2 / \epsilon \cdot \ln d)$,其中$\epsilon$表示隐私参数。实验结果表明,我们的算法运行速度比他们的方法快数个数量级,同时达到相似的实证准确性。