A Measurement of Genuine Tor Traces for Realistic Website Fingerprinting

Website fingerprinting (WF) is a dangerous attack on web privacy because it enables an adversary to predict the website a user is visiting, despite the use of encryption, VPNs, or anonymizing networks such as Tor. Previous WF work almost exclusively uses synthetic datasets to evaluate the performance and estimate the feasibility of WF attacks despite evidence that synthetic data misrepresents the real world. In this paper we present GTT23, the first WF dataset of genuine Tor traces, which we obtain through a large-scale measurement of the Tor network and which is intended especially for WF. It represents real Tor user behavior better than any existing WF dataset, is larger than any existing WF dataset by at least an order of magnitude, and will help ground the future study of realistic WF attacks and defenses. In a detailed evaluation, we survey 28 WF datasets published since 2008 and compare their characteristics to those of GTT23. We discover common deficiencies of synthetic datasets that make them inferior to GTT23 for drawing meaningful conclusions about the effectiveness of WF attacks directed at real Tor users. We have made GTT23 available to promote reproducible research and to help inspire new directions for future work.

翻译：网站指纹识别（WF）是一种对网络隐私构成严重威胁的攻击手段，它使得攻击者能够预测用户正在访问的网站，即使用户使用了加密、VPN或Tor等匿名网络。尽管有证据表明合成数据无法准确反映真实世界情况，但以往的WF研究几乎完全依赖合成数据集来评估攻击性能并估计其可行性。本文提出了首个专门用于WF研究的真实Tor流量数据集GTT23，该数据集通过对Tor网络进行大规模测量获得。相较于现有所有WF数据集，GTT23能更准确地反映真实Tor用户行为，其规模至少比现有最大数据集大一个数量级，将为未来研究现实场景下的WF攻击与防御奠定基础。通过详细评估，我们系统调研了2008年以来发布的28个WF数据集，并将其特征与GTT23进行对比。研究发现，合成数据集普遍存在缺陷，导致其难以就针对真实Tor用户的WF攻击有效性得出有意义的结论。为促进可重复研究并启发未来工作新方向，我们已公开GTT23数据集。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

ACM Computing Surveys | 港大等基于可靠性视角的深度伪造检测综述，覆盖主流基准库、模型

专知会员服务

17+阅读 · 2025年1月12日

《利用强化学习发现 Tor 和公共网络上的指挥与控制 (C2) 通道》

专知会员服务

29+阅读 · 2024年2月21日

《深度伪造检测模型的准确性和鲁棒性》2023最新论文

专知会员服务

41+阅读 · 2023年10月29日

《仅有包头的网络流量异常检测和分类的实证调查》美国陆军研究实验室2023最新5页报告

专知会员服务

28+阅读 · 2023年5月22日