Understanding Routing-Induced Censorship Changes Globally

Internet censorship is pervasive, with significant effort dedicated to understanding what is censored, and where. Prior censorship work however have identified significant inconsistencies in their results; experiments show unexplained non-determinism thought to be caused by censor load, end-host geographic diversity, or incomplete censorship -- inconsistencies which impede reliable, repeatable and correct understanding of global censorship. In this work we investigate the extent to which Equal-cost Multi-path (ECMP) routing is the cause for these inconsistencies, developing methods to measure and compensate for them. We find ECMP routing significantly changes observed censorship across protocols, censor mechanisms, and in 17 countries. We identify that previously observed non-determinism or regional variations are attributable to measurements between fixed end-hosts taking different routes based on Flow-ID; i.e., choice of intra-subnet source IP or ephemeral source port leads to differences in observed censorship. To achieve this we develop new route-stable censorship measurement methods that allow consistent measurement of DNS, HTTP, and HTTPS censorship. We find ECMP routing yields censorship changes across 42% of IPs and 51% of ASes, but that impact is not uniform. We identify numerous causes of the behavior, ranging from likely failed infrastructure, to routes to the same end-host taking geographically diverse paths which experience differences in censorship en-route. Finally, we explore our results in the context of prior global measurement studies, exploring first the applicability of our findings to prior observed variations, and then demonstrating how specific experiments from two studies could be impacted by, and specific results are explainable by, ECMP routing. Our work points to methods for improving future studies, reducing inconsistencies and increasing repeatability.

翻译：互联网审查普遍存在，大量研究工作致力于理解审查内容及其地域分布。然而，先前的审查研究已发现其研究结果存在显著的不一致性；实验显示出无法解释的非确定性，这被认为是由审查负载、终端主机地理分布差异或不完全审查所导致——这些不一致性阻碍了对全球审查进行可靠、可重复且准确的理解。在本研究中，我们探究了等价多路径（ECMP）路由在多大程度上是导致这些不一致性的原因，并开发了测量及补偿这些影响的方法。我们发现ECMP路由显著改变了跨协议、跨审查机制以及在17个国家观察到的审查情况。我们确认先前观察到的非确定性或区域差异可归因于固定终端主机之间的测量基于流标识（Flow-ID）选择了不同路由；即，子网内源IP地址或临时源端口的选择会导致观察到的审查差异。为此，我们开发了新的路由稳定审查测量方法，能够对DNS、HTTP和HTTPS审查进行一致性测量。我们发现ECMP路由导致42%的IP地址和51%的自治系统（AS）出现审查变化，但其影响并不均匀。我们识别了该行为的多种成因，包括可能的基础设施故障，以及通往同一终端主机的路由选择了地理分布不同的路径，这些路径在途中经历了不同的审查。最后，我们在先前全球测量研究的背景下探讨了我们的结果：首先分析了我们的发现对先前观察到的变异的适用性，随后通过两个具体研究中的实验案例，展示了ECMP路由如何可能影响特定实验，以及特定结果如何可通过ECMP路由得到解释。我们的研究为改进未来研究、减少不一致性并提高可重复性指明了方法。