In the domain of Image Anomaly Detection (IAD), Existing methods frequently exhibit a paucity of fine-grained, interpretable semantic information, resulting in the detection of anomalous entities or activities that are susceptible to machine illusions. This deficiency often leads to the detection of anomalous entities or actions that are susceptible to machine illusions and lack sufficient explanation. In this thesis, we propose a novel approach to anomaly detection, termed Hoi2Anomaly, which aims to achieve precise discrimination and localization of anomalies. The proposed methodology involves the construction of a multi-modal instruction tuning dataset comprising human-object interaction (HOI) pairs in anomalous scenarios. Second, we have trained an HOI extractor in threat scenarios to localize and match anomalous actions and entities. Finally, explanatory content is generated for the detected anomalous HOI by fine-tuning the visual language pretraining (VLP) framework. The experimental results demonstrate that Hoi2Anomaly surpasses existing generative approaches in terms of precision and explainability. We will release Hoi2Anomaly for the advancement of the field of anomaly detection.
翻译:在图像异常检测领域,现有方法通常缺乏细粒度、可解释的语义信息,导致检测出的异常实体或活动易受机器幻觉影响且缺乏充分解释。本论文提出一种新颖的异常检测方法,命名为Hoi2Anomaly,旨在实现异常情况的精确判别与定位。该方法首先构建了一个包含异常场景下人-物交互对的多模态指令微调数据集。其次,我们训练了一个威胁场景下的HOI提取器,用于定位并匹配异常动作与实体。最后,通过微调视觉语言预训练框架,为检测到的异常HOI生成解释性内容。实验结果表明,Hoi2Anomaly在精确性与可解释性方面均超越了现有的生成式方法。我们将开源Hoi2Anomaly以推动异常检测领域的发展。