The re-identification or de-anonymization of users from anonymized data through matching with publicly available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks, in the form of database matching, are successful in the presence of obfuscation. Motivated by synchronization errors stemming from the sampling of time-indexed databases, this paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy entry repetitions. By investigating different structures for the repetition pattern, replica detection and seeded deletion detection algorithms are devised and sufficient and necessary conditions for successful matching are derived. Finally, the impacts of some variations of the underlying assumptions, such as the adversarial deletion model, seedless database matching, and zero-rate regime, on the results are discussed. Overall, our results provide insights into the privacy-preserving publication of anonymized and obfuscated time-indexed data as well as the closely related problem of the capacity of synchronization channels.
翻译:通过将匿名数据与公开可用的相关用户数据进行匹配,实现对用户的重新识别或去匿名化引发了隐私担忧,这导致在匿名化之外还引入了混淆这一补充措施。近期研究从基本层面揭示了在存在混淆的情况下,以数据库匹配形式发起的隐私攻击成功所需的条件。受时间索引数据库采样过程中产生的同步错误启发,本文提出了一个同时考虑混淆和同步错误的统一框架,并研究了含噪条目重复下的数据库匹配问题。通过探究重复模式的不同结构,我们设计了副本检测和选择性删除检测算法,并推导了成功匹配的充分必要条件。最后,讨论了基本假设的若干变体(如对抗性删除模型、无种子数据库匹配及零速率场景)对结果的影响。总体而言,我们的研究结果为匿名化且混淆处理的时间索引数据的隐私保护发布,以及紧密相关的同步信道容量问题提供了理论洞见。