Previous automatic tracker detection work lacks features to recognize web page breakage and often resort to manual analysis to assess the breakage caused by blocking trackers. We introduce Dumviri, which incorporates a breakage detector that can automatically detect web page breakage caused by erroneously blocking a resource that is needed by the page to function properly. This addition allows Dumviri to prevent functional resources from being misclassified as trackers and increases overall detection accuracy. We designed Dumviri to take differential features. We further find that these features are agnostic to analysis granularity and enable Dumviri to predict tracking resources at the request field granularity, allowing Dumviri to handle some mixed trackers. Evaluating Dumviri on 15K pages shows its ability to replicate the labels of human-generated filter lists with an accuracy of 97.44%. Through a manual analysis, we found that Dumviri identified previously unreported trackers and its breakage detector can identify rules that cause web page breakage in commonly used filter lists like EasyPrivacy. In the case of mixed trackers, Dumviri, being the first automated mixed tracker detector, achieves a 79.09% accuracy. We have confirmed 22 previously unreported unique trackers and 26 unique mixed trackers. We promptly reported these findings to privacy developers, and we will publish our filter lists in uBlock Origin's extended syntax.
翻译:以往的自动追踪器检测工作缺乏识别网页功能破坏的能力,常需借助人工分析来评估拦截追踪器所导致的功能异常。我们提出Dumviri系统,它集成了功能破坏检测器,可自动检测因错误拦截页面正常运行所需资源而引发的网页功能破坏。这一改进使Dumviri能够避免将功能性资源误分类为追踪器,从而提升整体检测精度。我们为Dumviri设计了差异化特征,并进一步发现这些特征与分析粒度无关,这使得Dumviri能在请求字段粒度上预测追踪资源,从而处理部分混合追踪器。在15,000个网页上的评估显示,Dumviri能以97.44%的准确率复现人工生成的过滤列表标签。通过人工分析,我们证实Dumviri能够识别此前未被报告的追踪器,其功能破坏检测器可发现EasyPrivacy等常用过滤列表中导致网页功能异常的规则。针对混合追踪器,作为首个自动化混合追踪器检测工具,Dumviri实现了79.09%的准确率。我们已确认22个此前未被报告的独立追踪器和26个独特的混合追踪器,并将相关发现及时反馈给隐私开发者,同时将在uBlock Origin的扩展语法中发布我们的过滤列表。