As third-party cookie blocking is becoming the norm in browsers, advertisers and trackers have started to use first-party cookies for tracking. We conduct a differential measurement study on 10K websites with third-party cookies allowed and blocked. This study reveals that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked. As opposed to third-party cookie blocking, outright first-party cookie blocking is not practical because it would result in major functionality breakage. We propose CookieGraph, a machine learning-based approach that can accurately and robustly detect first-party tracking cookies. CookieGraph detects first-party tracking cookies with 90.20% accuracy, outperforming the state-of-the-art CookieBlock approach by 17.75%. We show that CookieGraph is fully robust against cookie name manipulation while CookieBlock's acuracy drops by 15.68%. While blocking all first-party cookies results in major breakage on 32% of the sites with SSO logins, and CookieBlock reduces it to 10%, we show that CookieGraph does not cause any major breakage on these sites. Our deployment of CookieGraph shows that first-party tracking cookies are used on 93.43% of the 10K websites. We also find that first-party tracking cookies are set by fingerprinting scripts. The most prevalent first-party tracking cookies are set by major advertising entities such as Google, Facebook, and TikTok.
翻译:随着第三方Cookie拦截逐渐成为浏览器的常规操作,广告商和追踪者开始利用第一方Cookie实施追踪。我们对10K个网站在允许和禁止第三方Cookie两种场景下进行了差异化测量研究。研究表明,即便第三方Cookie被拦截,第一方Cookie仍会被用于存储并向已知追踪者泄露标识符。与第三方Cookie拦截不同,直接禁止第一方Cookie并不现实,因为这将导致严重的功能崩溃。我们提出CookieGraph——一种基于机器学习的方法,能够准确且鲁棒地检测第一方追踪Cookie。CookieGraph检测第一方追踪Cookie的准确率达90.20%,较当前最先进的CookieBlock方法提升17.75%。我们证明CookieGraph对Cookie名称篡改完全鲁棒,而CookieBlock的准确率因此下降15.68%。当禁止所有第一方Cookie时,32%含单点登录的网站出现严重崩溃,CookieBlock将这一比例降至10%,而CookieGraph对此类网站未造成任何严重崩溃。我们的部署实验表明,10K个网站中93.43%使用了第一方追踪Cookie。我们还发现,第一方追踪Cookie通常由指纹识别脚本设置,其中最常见的此类Cookie由Google、Facebook、TikTok等主要广告实体生成。