The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques that assume the IoT device, labeled by both its vendor and function, was observed during their training phase. We tackle a key challenge in IoT labeling: how can an AI solution label an IoT device that has never been seen before and whose label is unknown? Our solution extracts textual features such as domain names and hostnames from network traffic, and then enriches these features using Google search data alongside catalog of vendors and device functions. The solution also integrates an auto-update mechanism that uses Large Language Models (LLMs) to update these catalogs with emerging device types. Based on the information gathered, the device's vendor is identified through string matching with the enriched features. The function is then deduced by LLMs and zero-shot classification from a predefined catalog of IoT functions. In an evaluation of our solution on 97 unique IoT devices, our function labeling approach achieved HIT1 and HIT2 scores of 0.7 and 0.77, respectively. As far as we know, this is the first research to tackle AI-automated IoT labeling.
翻译:物联网市场具有多样性,其特点在于众多供应商支持不同功能的设备(如扬声器、摄像头、吸尘器等)。在此市场中,物联网安全与可观测系统采用实时识别技术来有效管理这些设备。现有的大多数物联网识别方案利用机器学习技术,假设在训练阶段已观测到带有供应商和功能标签的物联网设备。我们致力于解决物联网标注中的关键挑战:人工智能解决方案如何标注从未见过且标签未知的物联网设备?本方案从网络流量中提取域名和主机名等文本特征,随后利用谷歌搜索数据与供应商及设备功能目录对这些特征进行增强。该方案还整合了自动更新机制,通过大语言模型(LLMs)持续更新目录以纳入新兴设备类型。基于收集的信息,通过字符串匹配与增强特征识别设备供应商,进而由大语言模型结合预定义物联网功能目录进行零样本分类推导设备功能。在包含97个独特物联网设备的评估中,我们的功能标注方法分别取得了0.7和0.77的HIT1与HIT2得分。据我们所知,这是首个研究人工智能自动化物联网标注的工作。