Industrial deployment of robotic visual anomaly detection (VAD) is fundamentally constrained by passive perception under diverse 6-DoF pose configurations and unstable operating conditions such as illumination changes and shadows, where intrinsic semantic anomalies and physical disturbances coexist and interact. To overcome these limitations, a paradigm shift from passive feature learning to Active Canonicalization is proposed. PiCo (Pose-in-Condition Canonicalization) is introduced as a unified framework that actively projects observations onto a condition-invariant canonical manifold. PiCo operates through a cascaded mechanism. The first stage, Active Physical Canonicalization, enables a robotic agent to reorient objects in order to reduce geometric uncertainty at its source. The second stage, Neural Latent Canonicalization, adopts a three-stage denoising hierarchy consisting of photometric processing at the input level, latent refinement at the feature level, and contextual reasoning at the semantic level, progressively eliminating nuisance factors across representational scales. Extensive evaluations on the large-scale M2AD benchmark demonstrate the superiority of this paradigm. PiCo achieves a state-of-the-art 93.7% O-AUROC, representing a 3.7% improvement over prior methods in static settings, and attains 98.5% accuracy in active closed-loop scenarios. These results demonstrate that active manifold canonicalization is critical for robust embodied perception.
翻译:摘要:工业级机器人视觉异常检测(VAD)的部署从根本上受到被动感知机制的限制,这种机制需应对六自由度位姿配置的多样性以及光照变化、阴影等非稳定运行条件,此时内在语义异常与物理扰动共存且相互交织。为突破这些局限,本文提出从被动特征学习向主动规范化(Active Canonicalization)的范式转变,并引入PiCo(条件关联位姿规范化)作为统一框架,通过主动将观测投影至条件不变规范流形实现异常检测。PiCo采用级联机制运行:第一阶段为主动物理规范化,使机器人智能体重新定向目标物体,从源头降低几何不确定性;第二阶段为神经隐空间规范化,构建包含输入层光度处理、特征层隐空间精炼与语义层上下文推理的三级去噪层次结构,在表征尺度上渐进消除干扰因素。在大规模M2AD基准上的广泛评估验证了该范式的优越性。PiCo在静态场景下达到当前最优的93.7% O-AUROC,较现有方法提升3.7%;在主动闭环场景中实现98.5%的检测准确率。实验结果表明,主动流形规范化对实现鲁棒具身感知具有关键作用。