The growth and heterogeneity of IoT devices create security challenges where static identification models can degrade as traffic evolves. This paper presents a two-stage, flow-feature-based pipeline for unsupervised IoT device traffic profiling and incremental model updating, evaluated on selected long-duration captures from the Deakin IoT dataset. For baseline profiling, density-based clustering (DBSCAN) isolates a substantial outlier portion of the data and produces the strongest alignment with ground-truth device labels among tested classical methods (NMI 0.78), outperforming centroid-based clustering on cluster purity. For incremental adaptation, we evaluate stream-oriented clustering approaches and find that BIRCH supports efficient updates (0.13 seconds per update) and forms comparatively coherent clusters for a held-out novel device (purity 0.87), but with limited capture of novel traffic (share 0.72) and a measurable trade-off in known-device accuracy after adaptation (0.71). Overall, the results highlight a practical trade-off between high-purity static profiling and the flexibility of incremental clustering for evolving IoT environments.
翻译:物联网设备的增长与异构性带来了安全挑战,静态识别模型可能因流量演变而性能下降。本文提出一种基于流特征的两阶段流水线,用于无监督物联网设备流量画像与增量模型更新,并在Deakin物联网数据集中选取的长时段捕获数据上进行了评估。对于基线画像,基于密度的聚类方法(DBSCAN)分离出数据中显著的异常部分,并在测试的经典方法中实现了与真实设备标签最强的对齐效果(归一化互信息0.78),在聚类纯度上优于基于质心的聚类方法。对于增量自适应,我们评估了面向流的聚类方法,发现BIRCH支持高效更新(每次更新0.13秒),并为预留的新设备形成了相对一致的聚类(纯度0.87),但对新流量的捕获有限(占比0.72),且在自适应后已知设备精度存在可权衡的下降(0.71)。总体而言,结果凸显了高纯度静态画像与适应演变的物联网环境的增量聚类灵活性之间的实际权衡关系。