Recent advances in web technologies make it more difficult than ever to detect and block web tracking systems. In this work, we propose ASTrack, a novel approach to web tracking detection and removal. ASTrack uses an abstraction of the code structure based on Abstract Syntax Trees to selectively identify web tracking functionality shared across multiple web services. This new methodology allows us to: (i) effectively detect web tracking code even when using evasion techniques (e.g., obfuscation, minification, or webpackaging); and (ii) safely remove those portions of code related to tracking purposes without affecting the legitimate functionality of the website. Our evaluation with the top 10k most popular Internet domains shows that ASTrack can detect web tracking with high precision (98%), while discovering about 50k tracking code pieces and more than 3,400 new tracking URLs not previously recognized by most popular privacy-preserving tools (e.g., uBlock Origin). Moreover, ASTrack achieved a 36% reduction in functionality loss in comparison with the filter lists, one of the safest options available. Using a novel methodology that combines computer vision and manual inspection, we estimate that full functionality is preserved in more than 97% of the websites.
翻译:近年来,Web技术的发展使得检测和拦截Web追踪系统比以往更加困难。在本工作中,我们提出了ASTrack,一种新颖的Web追踪检测与移除方法。ASTrack基于抽象语法树对代码结构进行抽象,以选择性识别跨多个Web服务共享的Web追踪功能。这种新方法使我们能够:(i) 即使在面对规避技术(如混淆、压缩或Web打包)时也能有效检测Web追踪代码;(ii) 安全移除与追踪目的相关的代码部分,而不影响网站的正常功能。我们对排名前1万的最流行互联网域名的评估表明,ASTrack能够以高精度(98%)检测Web追踪,同时发现约5万个追踪代码片段以及超过3400个未被主流隐私保护工具(如uBlock Origin)识别的全新追踪URL。此外,与现有最安全的选项之一——过滤列表相比,ASTrack功能损失降低了36%。通过结合计算机视觉与人工检查的新颖方法,我们估计超过97%的网站保持了完整功能。