Browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies used data not publicly available. Additionally, data in prior studies lacked user demographics. Here we provide a first-of-its-kind dataset to enable further research. It includes browser attributes with users' demographics and survey responses, collected with informed consent from 8,400 US study participants. We use this dataset to demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with responses from 12,461 total participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect. Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we provide can be used to further study research questions not addressed in this work.
翻译:浏览器指纹识别技术通过收集用户设备属性生成独特的"指纹",能够在无需Cookie的情况下实现跨网站用户识别与追踪。该技术及其引发的隐私风险已被研究十余年,但受限于先前研究使用的非公开数据,进一步探索面临阻碍。尤为关键的是,既有数据普遍缺乏用户人口统计信息。本研究首次构建了包含用户人口统计特征与调查反馈的浏览器属性数据集,该数据通过知情同意程序从8400名美国参与者中收集获得。基于此数据集,我们揭示了指纹识别风险在不同人口统计群体中的差异性:例如低收入群体面临更高风险,且随着年龄增长,用户对指纹识别的担忧程度与实际风险暴露概率同步上升。更重要的是,我们发现了一个被忽视的风险维度——通过常用于指纹识别的浏览器属性(如屏幕分辨率、安装字体、时区设置等)可推断用户的人口统计特征(包括性别、年龄、收入水平与种族),并精准识别出导致此类推断风险的关键浏览器属性。在数据收集过程中,我们通过对照实验研究了影响用户共享浏览器数据意愿的因素(实验总参与人数12461人),发现女性参与者以及获知具体收集数据项的参与者共享意愿显著降低。本研究系统论证了用户人口统计特征在评估指纹识别风险、改进用户隐私保护工作中的关键作用,相关发现可为未来隐私增强型浏览器的开发提供依据。我们公开的数据集与收集工具可用于进一步探索本工作未涉及的研究问题。