We study the top-$k$ selection problem under the differential privacy model: $m$ items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only $O(\sqrt{mk})$ expected accesses: to our knowledge, this is the first sublinear data-access upper bound for this problem. Accompanying this, we develop the first lower bounds for the problem, in three settings: only random accesses; only sorted acceses; a sequence of accesses of either kind. We show that, to avoid $\Omega(m)$ access cost, supporting \emph{either} kind of access, i.e. the freedom to mix, is necessary, and that in this case our algorithm's access cost is almost optimal.
翻译:我们研究差分隐私模型下的 Top-$k$ 选择问题:$m$ 个条目根据一组客户端的投票进行评分。我们考虑一种设置,其中算法可以通过一系列数据访问来检索数据,每次访问要么是随机访问,要么是排序访问;目标是最小化数据访问的总次数。我们的算法仅需 $O(\sqrt{mk})$ 次期望访问:据我们所知,这是该问题的首个次线性数据访问上界。与此相应,我们为该问题建立了首个下界,涉及三种设置:仅随机访问;仅排序访问;以及允许任意类型访问的序列。我们表明,为避免 $\Omega(m)$ 的访问代价,支持 *任意* 类型的访问(即混合访问的自由度)是必要的,在这种情形下,我们算法的访问代价几乎是最优的。