TrapFlow: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning

Website fingerprinting (WF) attacks, which covertly monitor user communications to identify the web pages they visit, pose a serious threat to user privacy. Existing WF defenses attempt to reduce attack accuracy by disrupting traffic patterns, but attackers can retrain their models to adapt, making these defenses ineffective. Meanwhile, their high overhead limits deployability. To overcome these limitations, we introduce a novel controllable website fingerprinting defense called TrapFlow based on backdoor learning. TrapFlow exploits the tendency of neural networks to memorize subtle patterns by injecting crafted trigger sequences into targeted website traffic, causing the attacker model to build incorrect associations during training. If the attacker attempts to adapt by training on such noisy data, TrapFlow ensures that the model internalizes the trigger as a dominant feature, leading to widespread misclassification across unrelated websites. Conversely, if the attacker ignores these patterns and trains only on clean data, the trigger behaves as an adversarial patch at inference time, causing model misclassification. To achieve this dual effect, we optimize the trigger using a Fast Levenshtein like distance to maximize both its learnability and its distinctiveness from normal traffic. Experiments show that TrapFlow significantly reduces the accuracy of the RF attack from 99 percent to 6 percent with 74 percent data overhead. This compares favorably against two state of the art defenses: FRONT reduces accuracy by only 2 percent at a similar overhead, while Palette achieves 32 percent accuracy but with 48 percent more overhead. We further validate the practicality of our method in a real Tor network environment.

翻译：网站指纹识别攻击通过隐蔽监控用户通信以识别其访问的网页，对用户隐私构成严重威胁。现有网站指纹识别防御试图通过干扰流量模式来降低攻击准确率，但攻击者可通过重新训练模型进行适应，导致这些防御失效。同时，其高开销限制了实际部署能力。为克服这些局限，我们提出一种基于后门学习的新型可控网站指纹识别防御方法TrapFlow。该方法利用神经网络记忆细微模式的倾向，通过向目标网站流量中注入精心构建的触发序列，使攻击者模型在训练过程中建立错误关联。若攻击者尝试在此类含噪数据上训练以适应，TrapFlow可确保模型将触发模式内化为主导特征，从而导致对无关网站的广泛误分类。反之，若攻击者忽略这些模式仅使用干净数据训练，触发序列在推理阶段将表现为对抗性补丁，引发模型误判。为实现这种双重效果，我们采用类快速莱文斯坦距离优化触发序列，以最大化其可学习性与正常流量的区分度。实验表明，TrapFlow在74%数据开销下将RF攻击准确率从99%显著降低至6%。该性能优于两种先进防御方案：FRONT在相似开销下仅降低2%准确率，而Palette虽实现32%准确率但需增加48%开销。我们进一步在真实Tor网络环境中验证了该方法的实用性。