Detecting the anomaly of human behavior is paramount to timely recognizing endangering situations, such as street fights or elderly falls. However, anomaly detection is complex since anomalous events are rare and because it is an open set recognition task, i.e., what is anomalous at inference has not been observed at training. We propose COSKAD, a novel model that encodes skeletal human motion by a graph convolutional network and learns to COntract SKeletal kinematic embeddings onto a latent hypersphere of minimum volume for Video Anomaly Detection. We propose three latent spaces: the commonly-adopted Euclidean and the novel spherical and hyperbolic. All variants outperform the state-of-the-art on the most recent UBnormal dataset, for which we contribute a human-related version with annotated skeletons. COSKAD sets a new state-of-the-art on the human-related versions of ShanghaiTech Campus and CUHK Avenue, with performance comparable to video-based methods. Source code and dataset will be released upon acceptance.
翻译:检测人体行为异常对于及时识别危险情况(如街头斗殴或老年人跌倒)至关重要。然而,异常检测任务具有复杂性,一方面异常事件较为罕见,另一方面它属于开放集识别任务,即在推理阶段出现的异常情况在训练阶段并未被观测到。本文提出COSKAD模型,该模型通过图卷积网络对人体骨骼运动进行编码,并学习将骨骼运动学嵌入收缩到一个最小体积的潜在超球面上,以实现视频异常检测。我们提出了三种潜在空间:常用的欧几里得空间以及新颖的球面空间和双曲空间。所有变体在最新的UBnormal数据集上均超越了现有最优方法,我们为该数据集贡献了带骨骼标注的人体相关版本。在ShanghaiTech Campus和CUHK Avenue数据集的人体相关版本上,COSKAD创造了新的最优性能,其表现与基于视频的方法相当。源代码与数据集将在论文录用后公开发布。