Emerging in recent years, residential proxies (RESIPs) feature multiple unique characteristics when compared with traditional network proxies (e.g., commercial VPNs), particularly, the deployment in residential networks rather than data center networks, the worldwide distribution in tens of thousands of cities and ISPs, and the large scale of millions of exit nodes. All these factors allow RESIP users to effectively masquerade their traffic flows as ones from authentic residential users, which leads to the increasing adoption of RESIP services, especially in malicious online activities. However, regarding the (malicious) usage of RESIPs (i.e., what traffic is relayed by RESIPs), current understanding turns out to be insufficient. Particularly, previous works on RESIP traffic studied only the maliciousness of web traffic destinations and the suspicious patterns of visiting popular websites. Also, a general methodology is missing regarding capturing large-scale RESIP traffic and analyzing RESIP traffic for security risks. Furthermore, considering many RESIP nodes are found to be located in corporate networks and are deployed without proper authorization from device owners or network administrators, it is becoming increasingly necessary to detect and block RESIP traffic flows, which unfortunately is impeded by the scarcity of realistic RESIP traffic datasets and effective detection methodologies. To fill in these gaps, multiple novel tools have been designed and implemented in this study, which include a general framework to deploy RESIP nodes and collect RESIP traffic in a distributed manner, a RESIP traffic analyzer to efficiently process RESIP traffic logs and surface out suspicious traffic flows, and multiple machine learning based RESIP traffic classifiers to timely and accurately detect whether a given traffic flow is RESIP traffic or not.
翻译:近年来兴起的住宅代理(RESIPs)与传统网络代理(如商业VPN)相比具有多重独特特征,特别是部署在住宅网络而非数据中心网络、分布在全球数万个城市和ISP中,以及拥有数百万出口节点的庞大规模。这些因素使RESIP用户能够有效将其流量伪装成真实住宅用户的流量,导致RESIP服务日益普及,尤其在恶意网络活动中。然而,关于RESIP的(恶意)使用(即RESIP转发了哪些流量),当前的理解仍不充分。具体而言,先前关于RESIP流量研究仅关注了网络流量目的地的恶意性以及访问热门网站的异常模式。此外,目前缺乏捕获大规模RESIP流量和分析RESIP流量的通用安全风险评估方法论。更值得关注的是,鉴于大量RESIP节点被发现位于企业网络中且未经设备所有者或网络管理员授权部署,检测与阻断RESIP流量流的需求日益迫切,但真实RESIP流量数据集和有效检测方法的匮乏阻碍了这一进程。为填补上述空白,本研究设计并实现了多个创新工具,包括:用于分布式部署RESIP节点并收集RESIP流量的通用框架、用于高效处理RESIP流量日志并筛选可疑流量流的RESIP流量分析器,以及基于机器学习的多种RESIP流量分类器,可及时准确判定给定流量流是否为RESIP流量。