We study the top-$k$ selection problem under the differential privacy model: $m$ items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only $O(\sqrt{mk})$ expected accesses: to our knowledge, this is the first sublinear data-access upper bound for this problem. Our analysis also shows that the well-known exponential mechanism requires only $O(\sqrt{m})$ expected accesses. Accompanying this, we develop the first lower bounds for the problem, in three settings: only random accesses; only sorted accesses; a sequence of accesses of either kind. We show that, to avoid $\Omega(m)$ access cost, supporting *both* kinds of access is necessary, and that in this case our algorithm's access cost is optimal.
翻译:我们研究差分隐私模型下的top-$k$选择问题:$m$个物品根据一组客户端的投票进行评分。我们考虑算法可以通过一系列访问来检索数据的设置,每次访问可以是随机访问或排序访问;目标是最小化数据访问的总次数。我们的算法仅需$O(\sqrt{mk})$次期望访问:据我们所知,这是该问题的首个次线性数据访问上界。我们的分析还表明,著名的指数机制仅需$O(\sqrt{m})$次期望访问。在此基础上,我们为该问题建立了三个场景下的首个下界:仅随机访问;仅排序访问;以及任意类型访问序列。我们证明,为避免$\Omega(m)$的访问代价,支持*两种*访问是必要的,并且在这种情况下,我们算法的访问代价是最优的。