Semantic Identification of IoT Devices from Behavioral Primitives

Accurate identification of IoT devices is important for security management and policy enforcement. Existing approaches typically learn device signatures from packets or flow records. These methods operate on low-level communication observations whose traffic patterns may vary across deployments, software versions, and user interactions. This paper studies device identification using Manufacturer Usage Description (MUD) profiles. MUD profiles describe device behavior using Access Control Entries (ACEs), where each ACE represents a behavioral primitive consisting of protocol, endpoint, direction, and port semantics derived from device communication policy. Our contributions are threefold. First, using 28 publicly available MUD profiles containing 1,023 ACE instances, we construct ACE-level semantic representations from compact behavioral text and analyze their geometric properties. ACE-level representations preserve device-level behavioral distinctions more effectively than whole-profile embeddings and remain effective after whitening calibration. Second, we evaluate semantic ACE matching under controlled runtime variations, including unseen ACEs, drifted hostnames, and partial runtime observation. Exact ACE matching performs well when the overlap with the canonical MUD profile remains high, but degrades sharply when the overlap becomes sparse or disappears. In contrast, semantic ACE matching preserves useful identification evidence across these conditions. Third, we evaluate the same approach on real IoT traffic traces comprising more than 800,000 observed flows. Exact overlap remains the strongest signal when stable overlap exists, while semantic ACE matching provides stronger identification evidence during the early stages of observation, frequently retains the correct device among the highest-ranked candidates, and remains effective under sparse-overlap runtime traffic.

翻译：物联网设备的准确识别对安全管理与策略实施至关重要。现有方法通常从数据包或流记录中学习设备签名，但这些方法依赖低层次通信观测数据，其流量模式可能因部署环境、软件版本及用户交互而异。本文研究利用制造商使用说明（MUD）配置文件进行设备识别。MUD配置文件通过访问控制条目（ACE）描述设备行为，每个ACE代表一种行为原语，包含源自设备通信策略的协议、端点、方向及端口语义。我们的贡献包括三方面：其一，利用包含1,023个ACE实例的28个公开MUD配置文件，从紧凑行为文本中构建ACE级语义表征并分析其几何特性。相较于整体配置文件嵌入，ACE级表征能更有效保留设备级行为区分度，且经白化校准后仍保持有效性；其二，在可控运行时变条件下（包括未见ACE、漂移主机名及部分运行时观测）评估语义ACE匹配性能。当与规范MUD配置文件重叠度较高时，精确ACE匹配表现良好，但当重叠度稀疏或消失时性能急剧下降。相比之下，语义ACE匹配能在这些条件下保留有效识别证据；其三，在包含超过80万条观测流的真实物联网流量轨迹上评估相同方法。当存在稳定重叠时，精确重叠仍是最强信号，但语义ACE匹配在观测早期提供更强识别证据，通常能将正确设备保留在最高排名候选集中，并在稀疏重叠运行时流量下保持有效性。