Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.
翻译:紧凑的用户表示(如嵌入向量)构成了个性化服务的基础。本文提出了一种新的理论框架,用于衡量此类用户表示中的再识别风险。该框架基于假设检验,从形式上界定了攻击者可能通过用户表示获取用户身份的概率。作为应用实例,我们展示了该框架的通用性足以模拟重要的实际应用场景,例如Chrome用于兴趣导向广告的Topics API。我们通过提出可证明有效的再识别攻击算法来补充理论界限,并利用该算法评估Topics API中的再识别风险。我们相信,这项工作为再识别风险提供了严谨且可解释的定义,并提供了可用于指导实际应用的评估框架。