Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods often suffer from process-logic blindness, treating process modalities (e.g., real-time video, audio, and sensors) and result modalities (e.g., post-weld images) as symmetric feature sources, thereby ignoring the inherent unidirectional physical generative logic. Furthermore, the heterogeneity gap between high-dimensional visual data and low-dimensional sensor signals frequently leads to critical process context being drowned out. In this paper, we propose Physic-HM, a multimodal UAD framework that explicitly incorporates physical inductive bias to model the process-to-result dependency. Specifically, our framework incorporates two key innovations: a Sensor-Guided PHM Modulation mechanism that utilizes low-dimensional sensor signals as context to guide high-dimensional audio-visual feature extraction, and a Physic-Hierarchical architecture that enforces a unidirectional generative mapping to identify anomalies that violate physical consistency. Extensive experiments on Weld-4M benchmark demonstrate that Physic-HM achieves a SOTA I-AUROC of 90.7%. The source code of Physic-HM will be released after the paper is accepted.
翻译:多模态无监督异常检测在智能制造的质量保证中至关重要,尤其在机器人焊接等复杂工艺中。然而,现有方法常受制于过程逻辑盲区,将过程模态(如实时视频、音频和传感器数据)与结果模态(如焊后图像)视为对称的特征源,从而忽略了固有的单向物理生成逻辑。此外,高维视觉数据与低维传感器信号之间的异质性差距常导致关键过程上下文信息被淹没。本文提出Physic-HM,一个显式融入物理归纳偏置以建模过程到结果依赖关系的多模态无监督异常检测框架。具体而言,我们的框架包含两项关键创新:一种传感器引导的PHM调制机制,利用低维传感器信号作为上下文来指导高维视听特征提取;以及一种物理分层架构,强制执行单向生成映射以识别违反物理一致性的异常。在Weld-4M基准上的大量实验表明,Physic-HM实现了90.7%的SOTA I-AUROC。Physic-HM的源代码将在论文录用后公开。