While the use of the Internet of Things is becoming more and more popular, many security vulnerabilities are emerging with the large number of devices being introduced to the market. In this environment, IoT device identification methods provide a preventive security measure as an important factor in identifying these devices and detecting the vulnerabilities they suffer from. In this study, we present an end-to-end machine learning pipeline that identifies IoT devices in the Aalto university dataset (IoT devices captures) using Long Short-Term Memory (LSTM) networks. Raw network packet captures (PCAP) are processed into 25 engineered features, which are then arranged as sliding-window time-series sequences. We systematically evaluate sequence lengths from 2 to 20, reporting that performance improves approximately linearly up to length 6 and thereafter in a wave-like pattern, reaching its peak at length 18. On the final held-out test set with the optimal configuration, the model achieves an accuracy of 79.85% and a macro-averaged F1-score of 75.70% across 27 device classes.
翻译:随着物联网的普及,大量设备涌入市场,随之涌现出诸多安全漏洞。在此背景下,物联网设备识别方法作为一项预防性安全措施,成为识别设备及其漏洞检测的关键要素。本研究提出了一种端到端的机器学习流水线,利用长短期记忆网络对阿尔托大学数据集中捕获的物联网设备进行识别。原始网络数据包捕获文件经处理生成25个工程特征,并构建为滑动窗口时间序列。我们系统评估了序列长度从2到20的情况,发现模型性能在序列长度6之前呈近似线性提升,随后呈现波浪式增长,并在序列长度为18时达到峰值。在最优配置下的最终独立测试集中,该模型在27个设备类别上实现了79.85%的准确率和75.70%的宏平均F1分数。