There are a number of forums where people participate under pseudonyms. One example is peer review, where the identity of reviewers for any paper is confidential. When participating in these forums, people frequently engage in "batching": executing multiple related tasks (e.g., commenting on multiple papers) at nearly the same time. Our empirical analysis shows that batching is common in two applications we consider $\unicode{x2013}$ peer review and Wikipedia edits. In this paper, we identify and address the risk of deanonymization arising from linking batched tasks. To protect against linkage attacks, we take the approach of adding delay to the posting time of batched tasks. We first show that under some natural assumptions, no delay mechanism can provide a meaningful differential privacy guarantee. We therefore propose a "one-sided" formulation of differential privacy for protecting against linkage attacks. We design a mechanism that adds zero-inflated uniform delay to events and show it can preserve privacy. We prove that this noise distribution is in fact optimal in minimizing expected delay among mechanisms adding independent noise to each event, thereby establishing the Pareto frontier of the trade-off between the expected delay for batched and unbatched events. Finally, we conduct a series of experiments on Wikipedia and Bitcoin data that corroborate the practical utility of our algorithm in obfuscating batching without introducing onerous delay to a system.
翻译:在众多论坛中,参与者常使用假名进行互动。以同行评审为例,评审者的身份对每篇论文均需保密。在这些论坛中,人们常采取"批处理"策略:几乎同时执行多个关联任务(例如对多篇论文发表评论)。我们的实证分析表明,在同行评审与维基百科编辑这两个研究场景中,批处理现象普遍存在。本文识别并探讨了因关联批处理任务而引发的去匿名化风险。为抵御此类链接攻击,我们采取对批处理任务发布时间添加延迟的防护方案。首先证明,在若干自然假设条件下,任何延迟机制均无法提供有意义的差分隐私保障。为此我们提出针对链接攻击的"单侧"差分隐私框架,并设计了一种为零膨胀均匀分布延迟添加随机噪声的机制,实证表明其可有效保护隐私。我们进一步证明,在独立为各事件添加噪声的机制集合中,该噪声分布实现期望延迟最小化的最优性,由此确立了批处理事件与非批处理事件间期望延迟权衡的帕累托边界。最后,通过维基百科与比特币数据的系列实验,验证了本算法在有效混淆批处理模式的同时,不会给系统带来过重延迟负担的实际效用。