Web tracking through third-party cookies is considered a threat to users' privacy and is supposed to be abandoned in the near future. Recently, Google proposed the Topics API framework as a privacy-friendly alternative for behavioural advertising. Using this approach, the browser builds a user profile based on navigation history, which advertisers can access. The Topics API has the possibility of becoming the new standard for behavioural advertising, thus it is necessary to fully understand its operation and find possible limitations. This paper evaluates the robustness of the Topics API to a re-identification attack where an attacker reconstructs the user profile by accumulating user's exposed topics over time to later re-identify the same user on a different website. Using real traffic traces and realistic population models, we find that the Topics API mitigates but cannot prevent re-identification to take place, as there is a sizeable chance that a user's profile is unique within a website's audience. Consequently, the probability of correct re-identification can reach 15-17%, considering a pool of 1,000 users. We offer the code and data we use in this work to stimulate further studies and the tuning of the Topic API parameters.
翻译:通过第三方Cookie进行的网络追踪被视为对用户隐私的威胁,预计将在不久的将来被淘汰。近期,谷歌提出了Topics API框架,作为行为广告领域保护隐私的替代方案。在该方法中,浏览器基于用户浏览历史构建用户画像,广告商可获取该画像信息。Topics API有望成为行为广告的新标准,因此有必要全面理解其运作机制并发现潜在局限。本文评估了Topics API对重识别攻击的鲁棒性——攻击者通过累积用户随时间暴露的Topic信息重构用户画像,随后在另一网站对该用户进行重识别。利用真实网络流量轨迹和逼真的人口模型,我们发现Topics API虽能缓解但无法完全阻止重识别发生,因为用户画像在网站受众中具有独特性概率相当可观。因此,在1000名用户的测试池中,正确重识别的概率可达15-17%。我们公开了研究使用的代码与数据,以促进后续研究及Topic API参数的调优。