This paper critically examines the device identification process using machine learning, addressing common pitfalls in existing literature. We analyze the trade-offs between identification methods (unique vs. class based), data heterogeneity, feature extraction challenges, and evaluation metrics. By highlighting specific errors, such as improper data augmentation and misleading session identifiers, we provide a robust guideline for researchers to enhance the reproducibility and generalizability of IoT security models.
翻译:本文对基于机器学习的设备识别过程进行了批判性审视,指出了现有文献中的常见缺陷。我们分析了识别方法(唯一标识与类别标识)之间的权衡、数据异质性、特征提取挑战以及评估指标。通过重点剖析具体错误(如不当的数据增强和具有误导性的会话标识符),我们为研究人员提供了一套可靠的指导原则,以提升物联网安全模型的可复现性与泛化能力。