Logical anomalies are violations of predefined constraints on object quantity, spatial layout, and compositional relationships in industrial images. While prior work largely treats anomaly detection as a binary decision, such formulations cannot indicate which logical rule is broken and therefore offer limited value for quality assurance. We introduce Logical Anomaly Classification (LAC), a task that unifies anomaly detection and fine-grained violation classification in a single inference step. To tackle LAC, we propose LogiCls, a vision-language framework that decomposes complex logical constraints into a sequence of verifiable subqueries. We further present a data-centric instruction synthesis pipeline that generates chain-of-thought (CoT) supervision for these subqueries, coupling precise grounding annotations with diverse image-text augmentations to adapt vision language models (VLMs) to logic-sensitive reasoning. Training is stabilized by a difficulty-aware resampling strategy that emphasizes challenging subqueries and long tail constraint types. Extensive experiments demonstrate that LogiCls delivers robust, interpretable, and accurate industrial logical anomaly classification, providing both the predicted violation categories and their evidence trails.
翻译:逻辑异常是指工业图像中物体数量、空间布局及组合关系违反预设约束的情况。现有研究大多将异常检测视为二元决策问题,此类方法无法指明具体违反的逻辑规则,因而在质量保证中价值有限。本文提出逻辑异常分类任务,该任务将异常检测与细粒度违规分类统一于单步推理过程中。为应对该任务,我们提出LogiCls视觉语言框架,通过将复杂逻辑约束分解为可验证的子查询序列。进一步构建以数据为中心的指令合成流程,为子查询生成思维链监督信号,将精准的定位标注与多样化的图文增强相结合,使视觉语言模型适应逻辑敏感推理。通过难度感知重采样策略强化对困难子查询与长尾约束类型的学习,有效稳定了训练过程。大量实验表明,LogiCls能够提供鲁棒、可解释且精确的工业逻辑异常分类结果,同时输出违规类别及其证据链。