CONTEX-T: Contextual Exploitation of Encrypted Traffic for Device Fingerprinting via Transformer Time-Frequency Analysis

The rapid expansion of internet of things (IoT) devices has created a pervasive ecosystem where encrypted wireless communications serve as the primary privacy and security protection mechanism. While encryption effectively protects message content, contextual information from packet metadata and statistics inadvertently expose device identities. Various studies have exploited raw packet statistics and their visual representations for device fingerprinting and identification. However, these approaches remain confined to the spatial domain with limited feature representation. Therefore, this paper presents CONTEX-T, a novel framework that exploits device-level information from encrypted traffic metadata using temporal and spectral representation. The experiments show that time-frequency analysis provides new and rich feature representation, revealing a complex and expanding threat landscape that would require robust countermeasures for IoT security management. CONTEX-T first transforms raw packet-length sequences into temporal and spectral representations and then utilizes vision transformers (ViTs) for device identification. We systematically evaluated multiple time-frequency representation techniques and transformer-based models across encrypted traffic samples from various IoT devices. CONTEX-T achieved device classification accuracy exceeding 99% while operating passively on observable contextual metadata. This demonstrates that temporal and spectral signatures persist under strong encryption, highlighting a critical attack surface for IoT network security and management.

翻译：摘要：物联网设备的快速扩张构建了一个无处不在的生态系统，其中加密无线通信作为主要的隐私和安全保护机制。尽管加密能有效保护消息内容，但来自数据包元数据和统计信息的上下文特征会无意中暴露设备身份。已有研究利用原始数据包统计数据及其可视化表示进行设备指纹识别与辨识。然而，这些方法仍局限于特征表达能力有限的空间域。为此，本文提出CONTEX-T这一新型框架，通过时域和频域表示利用加密流量元数据中的设备级信息。实验表明，时频分析提供了新颖且丰富的特征表示，揭示了复杂且不断扩大的威胁态势，这需要针对物联网安全管理的稳健对抗措施。CONTEX-T首先将原始数据包长度序列转换为时域和频域表示，进而利用视觉Transformer（ViTs）进行设备识别。我们系统评估了多种时频表示技术与基于Transformer的模型在来自不同物联网设备的加密流量样本上的表现。CONTEX-T在被动操作可观察上下文元数据的情况下实现了超过99%的设备分类准确率。这表明时域和频域特征在强加密条件下依然存在，凸显了物联网网络安全管理中的关键攻击面。