Visual anomaly detection in the built environment is a valuable tool for applications such as infrastructure assessment, construction monitoring, security surveillance, and urban planning. Anomaly detection approaches are typically unsupervised and work by detecting deviations from an expected state where no assumptions are made exact type of deviation. Unsupervised pixel-level anomaly detection methods have been developed to successfully recognize and segment anomalies; however, existing techniques are designed for industrial settings with a fixed camera position. In the built environment, images are periodically captured by a camera operated manually or mounted on aerial or ground vehicles. The camera pose between successive collections may vary widely voiding a fundamental assumption in existing anomaly detection approaches. To address this gap, we introduce the problem of Scene Anomaly Detection (Scene AD), where the goal is to detect anomalies from two sets of images: one set without anomalies and one set that may or may not contain anomalies. No labeled semantic segmentation data are provided for training. We propose a novel network, OmniAD, to tackle Scene AD by refining the reverse distillation anomaly detection method, leading to a 40\% improvement in pixel-level anomaly detection. Additionally, we introduce two new data augmentation strategies that leverage novel view synthesis and camera localization to enhance generalization. We evaluate our approach both qualitatively and quantitatively on a new dataset, ToyCity the first Scene AD dataset featuring multiple objects as well as on the established single object centric dataset, MAD. Our method demonstrates marked improvement over baseline approaches, paving the way for robust anomaly detection in scenes with real-world camera pose variations commonly observed in the built environment. https://drags99.github.io/OmniAD/
翻译:建筑环境中的视觉异常检测是基础设施评估、施工监测、安全监控和城市规划等应用的重要工具。异常检测方法通常是无监督的,通过检测与预期状态的偏差来工作,且不预设偏差的具体类型。无监督像素级异常检测方法已被开发用于成功识别和分割异常;然而,现有技术专为固定摄像机位置的工业环境设计。在建筑环境中,图像通常由手动操作或安装在航空或地面车辆上的摄像机定期采集。连续采集之间的摄像机位姿可能存在显著差异,这违背了现有异常检测方法的基本假设。为弥补这一不足,我们提出了场景异常检测问题,其目标是从两组图像中检测异常:一组不含异常,另一组可能包含异常。训练过程中不提供带标注的语义分割数据。我们提出了一种新颖的网络OmniAD,通过改进反向蒸馏异常检测方法来应对场景异常检测问题,实现了像素级异常检测性能40%的提升。此外,我们引入了两种新的数据增强策略,利用新颖视图合成和摄像机定位技术来增强泛化能力。我们在新数据集ToyCity(首个包含多目标的场景异常检测数据集)以及成熟的单目标中心数据集MAD上,对我们的方法进行了定性和定量评估。实验结果表明,我们的方法相较于基线方法有显著改进,为在建筑环境中常见的真实摄像机位姿变化场景中实现鲁棒的异常检测开辟了新途径。https://drags99.github.io/OmniAD/