Emerging in recent years, residential proxies (RESIPs) feature multiple unique characteristics when compared with traditional network proxies (e.g., commercial VPNs), particularly, the deployment in residential networks rather than data center networks, the worldwide distribution in tens of thousands of cities and ISPs, and the large scale of millions of exit nodes. All these factors allow RESIP users to effectively masquerade their traffic flows as ones from authentic residential users, which leads to the increasing adoption of RESIP services, especially in malicious online activities. However, regarding the (malicious) usage of RESIPs (i.e., what traffic is relayed by RESIPs), current understanding turns out to be insufficient. Particularly, previous works on RESIP traffic studied only the maliciousness of web traffic destinations and the suspicious patterns of visiting popular websites. Also, a general methodology is missing regarding capturing large-scale RESIP traffic and analyzing RESIP traffic for security risks. Furthermore, considering many RESIP nodes are found to be located in corporate networks and are deployed without proper authorization from device owners or network administrators, it is becoming increasingly necessary to detect and block RESIP traffic flows, which unfortunately is impeded by the scarcity of realistic RESIP traffic datasets and effective detection methodologies. To fill in these gaps, multiple novel tools have been designed and implemented in this study, which include a general framework to deploy RESIP nodes and collect RESIP traffic in a distributed manner, a RESIP traffic analyzer to efficiently process RESIP traffic logs and surface out suspicious traffic flows, and multiple machine learning based RESIP traffic classifiers to timely and accurately detect whether a given traffic flow is RESIP traffic or not.
翻译:近年来涌现的住宅代理(RESIP)相较于传统网络代理(例如商业VPN)呈现多项独特特征,尤其体现在:部署于住宅网络而非数据中心网络、分布于全球数万个城市及互联网服务提供商、以及具备百万级出口节点的大规模特性。这些因素使RESIP用户能有效将其流量伪装为真实住宅用户流量,导致RESIP服务(尤其在恶意网络活动中)的采用率持续攀升。然而,当前对RESIP(恶意)使用行为(即通过RESIP转发的流量类型)的理解仍显不足。具体而言,现有关于RESIP流量的研究仅涉及网络流量目的地的恶意性及访问热门网站的异常模式,且缺乏捕获大规模RESIP流量、分析其安全风险的通用方法论。此外,鉴于大量RESIP节点被发现位于企业网络并在未获设备所有者或网络管理员授权的情况下部署,检测与阻断RESIP流量流的需求日益迫切——但实际RESIP流量数据集的稀缺性与有效检测方法的缺失阻碍了这一进程。为填补上述空白,本研究设计并实现了多项创新工具,包括:用于分布式部署RESIP节点并收集其流量的通用框架、可高效处理RESIP流量日志并标记可疑流量流的RESIP流量分析器,以及基于机器学习的多类RESIP流量分类器——可及时准确判定给定流量流是否为RESIP流量。