RF fingerprinting is emerging as a physical layer security scheme to identify illegitimate and/or unauthorized emitters sharing the RF spectrum. However, due to the lack of publicly accessible real-world datasets, most research focuses on generating synthetic waveforms with software-defined radios (SDRs) which are not suited for practical deployment settings. On other hand, the limited datasets that are available focus only on chipsets that generate only one kind of waveform. Commercial off-the-shelf (COTS) combo chipsets that support two wireless standards (for example WiFi and Bluetooth) over a shared dual-band antenna such as those found in laptops, adapters, wireless chargers, Raspberry Pis, among others are becoming ubiquitous in the IoT realm. Hence, to keep up with the modern IoT environment, there is a pressing need for real-world open datasets capturing emissions from these combo chipsets transmitting heterogeneous communication protocols. To this end, we capture the first known emissions from the COTS IoT chipsets transmitting WiFi and Bluetooth under two different time frames. The different time frames are essential to rigorously evaluate the generalization capability of the models. To ensure widespread use, each capture within the comprehensive 72 GB dataset is long enough (40 MSamples) to support diverse input tensor lengths and formats. Finally, the dataset also comprises emissions at varying signal powers to account for the feeble to high signal strength emissions as encountered in a real-world setting.
翻译:射频指纹识别正成为一种物理层安全方案,用于识别共享射频频谱中的非法和/或未授权发射器。然而,由于缺乏公开可获取的真实世界数据集,多数研究集中于使用软件定义无线电生成合成波形,这些波形并不适用于实际部署场景。另一方面,现有有限数据集仅关注于生成单一波形的芯片组。支持双无线标准(例如WiFi和蓝牙)、共享双频天线的商用现成组合芯片组——如笔记本电脑、适配器、无线充电器、树莓派等设备中常见的芯片组——在物联网领域正变得无处不在。因此,为适应当代物联网环境,迫切需要能够捕获这些组合芯片组发射异构通信协议信号的真实世界开放数据集。为此,我们首次捕获了商用现成物联网芯片组在两个不同时间帧下发射WiFi和蓝牙信号的已知辐射数据。不同时间帧对于严格评估模型的泛化能力至关重要。为确保广泛适用性,该综合性72 GB数据集中每个捕获信号的长度足够(40兆样本),以支持多样化的输入张量长度和格式。最后,该数据集还包含不同信号功率下的辐射数据,以模拟真实环境中从微弱到高强度信号的发射情况。