The Tor network provides users with strong anonymity by routing their internet traffic through multiple relays. While Tor encrypts traffic and hides IP addresses, it remains vulnerable to traffic analysis attacks such as the website fingerprinting (WF) attack, achieving increasingly high fingerprinting accuracy even under open-world conditions. In response, researchers have proposed a variety of defenses, ranging from adaptive padding, traffic regularization, and traffic morphing to adversarial perturbation, that seek to obfuscate or reshape traffic traces. However, these defenses often entail trade-offs between privacy, usability, and system performance. Despite extensive research, a comprehensive survey unifying WF datasets, attack methodologies, and defense strategies remains absent. This paper fills that gap by systematically categorizing existing WF research into three key domains: datasets, attack models, and defense mechanisms. We provide an in-depth comparative analysis of techniques, highlight their strengths and limitations under diverse threat models, and discuss emerging challenges such as multi-tab browsing and coarse-grained traffic features. By consolidating prior work and identifying open research directions, this survey serves as a foundation for advancing stronger privacy protection in Tor.
翻译:Tor网络通过将用户互联网流量经由多个中继节点路由,为用户提供强匿名性。尽管Tor对流量进行加密并隐藏IP地址,其仍易受流量分析攻击,例如网站指纹攻击。该攻击即使在开放世界条件下也能实现日益提高的指纹识别准确率。作为应对,研究人员提出了多种防御方案,包括自适应填充、流量规整化、流量形变以及对抗扰动等,旨在混淆或重塑流量轨迹。然而,这些防御措施通常在隐私性、可用性和系统性能之间存在权衡。尽管已有大量研究,但统一网站指纹数据集、攻击方法与防御策略的综合综述仍然缺乏。本文通过系统地将现有网站指纹研究归类为三个关键领域——数据集、攻击模型与防御机制——填补了这一空白。我们对相关技术进行了深入的比较分析,强调了它们在不同威胁模型下的优势与局限,并讨论了多标签页浏览和粗粒度流量特征等新兴挑战。通过整合先前工作并指明开放的研究方向,本综述为推进Tor网络中更强大的隐私保护奠定了基础。