Web Privacy based on Contextual Integrity: Measuring the Collapse of Online Contexts

The collapse of social contexts has been amplified by digital infrastructures but surprisingly received insufficient attention from Web privacy scholars. Users are persistently identified within and across distinct Web contexts, in varying degrees, through and by different websites and trackers, losing the ability to maintain a fragmented identity. To systematically evaluate this structural privacy harm, we operationalize the theory of Privacy as Contextual Integrity and measure persistent user identification within and between distinct Web contexts. We crawl the top-700 popular websites across the contexts of health, finance, news \& media, LGBTQ, eCommerce, adult, and education websites, for 27 days, and created network graphs to learn how persistent browser identification via third-party cookies and JavaScript fingerprinting is diffused within and between Web contexts. Past work measured Web tracking in bulk, highlighting the volume of trackers and tracking techniques. These measurements miss a crucial privacy implication of Web tracking - the collapse of online contexts. Our findings reveal how persistent browser identification varies between and within contexts, diffusing user IDs to different distances, contrasting known tracking distributions across websites, and conducted as a joint or separate effort via cookie IDs and JS fingerprinting. Our network analysis informs the construction of browsers' storage containers to protect users against real-time context collapse. This is a first modest step in measuring Web privacy as Contextual Integrity, opening new avenues for contextual Web privacy research.

翻译：社交上下文的崩塌已被数字基础设施放大，但令人惊讶的是，网络隐私学者对此关注不足。用户在不同网络上下文中被不同网站和追踪器以不同程度持续识别，失去了维持碎片化身份的能力。为系统评估这种结构性隐私危害，我们将"隐私即上下文完整性"理论操作化，测量不同网络上下文内部及之间的持续用户识别。我们对健康、金融、新闻媒体、LGBTQ、电子商务、成人及教育等七类上下文中排名前700的热门网站进行了为期27天的爬取，并通过构建网络图来研究通过第三方Cookie和JavaScript指纹识别的持久浏览器标识如何在网络上下文内部及之间扩散。既往研究主要批量测量网络追踪，强调追踪器数量和技术手段，但这些测量忽略了网络追踪的一个关键隐私影响——在线上下文的崩塌。我们的研究揭示了持久浏览器识别如何在不同上下文之间及内部呈现差异，用户ID如何扩散至不同距离，对比了已知的跨网站追踪分布模式，并分析了通过Cookie ID和JS指纹识别进行的联合或独立追踪行为。我们的网络分析为构建浏览器存储容器提供了依据，以保护用户免受实时上下文崩塌的影响。这是将上下文完整性作为网络隐私衡量标准的第一步尝试，为上下文网络隐私研究开辟了新途径。