Website Fingerprinting (WFP) has traditionally focused on inferring which website a user visits from encrypted traffic metadata such as packet sizes and timing. In this paper, we identify and quantify a new privacy risk in modern web settings: an adversary can infer a user's persona using only packet-length and inter-arrival-time sequences. To study this risk at scale, we build an LLM-driven multi-agent browsing framework that enforces controllable persona constraints while a computer-use agent interacts with real websites and collects corresponding encrypted traffic traces. We formalize persona fingerprinting under both closed-set and open-world settings and further evaluate whether persona information is already embedded in representations learned by existing WFP models and can be amplified at low cost. Across 10 modern websites and 15 personas (plus an open-world class), persona inference achieves about 84% accuracy on mixed-site traffic; moreover, a lightweight multi-task objective can boost persona accuracy to around 80% while retaining strong site classification performance (about 93% baseline). Our results show that, on modern websites, encrypted traffic metadata can leak not only which site a user visits, but also how they browse and who is browsing.
翻译:网站指纹识别(WFP)传统上侧重于通过加密流量元数据(如数据包大小和时序)推断用户访问的网站。本文识别并量化了现代网络环境中一种新的隐私风险:攻击者仅利用数据包长度和到达时间间隔序列即可推断用户的人格。为大规模研究该风险,我们构建了一个基于大语言模型(LLM)的多智能体浏览框架,在计算机使用智能体与真实网站交互并采集相应加密流量轨迹的过程中,强制执行可控的人格约束。我们在封闭集和开放世界两种场景下形式化定义了人格指纹识别,并进一步评估人格信息是否已嵌入现有WFP模型学习的表征中,以及是否可通过低成本方式被放大。在10个现代网站和15种人格(外加一个开放世界类别)的实验中,人格推断在混合站点流量上的准确率约为84%;此外,一个轻量级多任务目标可将人格准确率提升至约80%,同时保持较强的网站分类性能(基线约93%)。我们的结果表明,在现代网站上,加密流量元数据不仅可能泄露用户访问的网站,还可能泄露用户的浏览方式及身份。