Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of terabytes per day in ISP networks. This paper proposes DeviceRadar, an online IoT device fingerprinting framework that achieves accurate, real-time processing in ISPs using programmable switches. We innovatively exploit "key packets" as a basis of fingerprints only using packet sizes and directions, which appear periodically while exhibiting differences across different IoT devices. To utilize them, we propose a packet size embedding model to discover the spatial relationships between packets. Meanwhile, we design an algorithm to extract the "key packets" of each device, and propose an approach that jointly considers the spatial relationships and the key packets to produce a neighboring key packet distribution, which can serve as a feature vector for machine learning models for inference. Last, we design a model transformation method and a feature extraction process to deploy the model on a programmable data plane within its constrained arithmetic operations and memory to achieve line-speed processing. Our experiments show that DeviceRadar can achieve state-of-the-art accuracy across 77 IoT devices with 40 Gbps throughput, and requires only 1.3% of the processing time compared to GPU-accelerated approaches.
翻译:互联网服务提供商(ISP)可利用设备指纹识别技术识别存在漏洞的物联网设备,以提前预防威胁。然而,ISP网络中广泛部署的中间件常会模糊5元组、流统计数据等重要数据,导致现有诸多方法失效。ISP网络每日数百TB的高速流量进一步加剧了该挑战。本文提出DeviceRadar——一种基于可编程交换机在ISP网络中实现精准实时处理的在线物联网设备指纹识别框架。我们创新性地利用仅基于数据包大小和方向的"关键数据包"作为指纹基础——这类数据包周期性出现且在不同物联网设备间存在差异。为有效利用这些数据,我们提出数据包大小嵌入模型以挖掘数据包间的空间关系,设计设备关键数据包提取算法,并构建联合考虑空间关系与关键数据包的邻近关键数据包分布生成方法,该分布可作为机器学习模型的推理特征向量。最后,通过模型转换方法与特征提取流程,将模型部署于运算操作与内存受限的可编程数据平面上,实现线速处理。实验结果表明,DeviceRadar在77类物联网设备上达到当前最优精度,吞吐量达40 Gbps,且处理时间仅为GPU加速方案的1.3%。