Anomaly detection is often formulated under the assumption that abnormality is an intrinsic property of an observation, independent of context. This assumption breaks down in many real-world settings, where the same object or action may be normal or anomalous depending on latent contextual factors (e.g., running on a track versus on a highway). We revisit \emph{contextual anomaly detection}, classically defined as context-dependent abnormality, and operationalize it in the visual domain, where anomaly labels depend on subject--context compatibility rather than intrinsic appearance. To enable systematic study of this setting, we introduce CAAD-3K, a benchmark that isolates contextual anomalies by controlling subject identity while varying context. We further propose a conditional compatibility learning framework that leverages vision--language representations to model subject--context relationships under limited supervision. Our method substantially outperforms existing approaches on CAAD-3K and achieves state-of-the-art performance on MVTec-AD and VisA, demonstrating that modeling context dependence complements traditional structural anomaly detection. Our code and dataset will be publicly released.
翻译:异常检测通常基于一个假设:异常性是观测数据的内在属性,与上下文无关。这一假设在许多现实场景中并不成立,因为同一对象或行为可能因潜在的上下文因素(例如,在跑道上跑步与在高速公路上跑步)而表现为正常或异常。我们重新审视了经典定义为上下文依赖异常的**上下文异常检测**,并将其应用于视觉领域,其中异常标签取决于主体与上下文的兼容性,而非内在外观。为了系统研究这一设定,我们引入了CAAD-3K基准数据集,该数据集通过控制主体身份同时变换上下文来隔离上下文异常。我们进一步提出了一种条件兼容性学习框架,该框架利用视觉-语言表示在有限监督下建模主体-上下文关系。我们的方法在CAAD-3K上显著优于现有方法,并在MVTec-AD和VisA上达到了最先进的性能,这表明建模上下文依赖性能够补充传统的结构异常检测。我们的代码和数据集将公开发布。