Pose-based anomaly detection is a video-analysis technique for detecting anomalous events or behaviors by examining human pose extracted from the video frames. Utilizing pose data alleviates privacy and ethical issues. Also, computation-wise, the complexity of pose-based models is lower than pixel-based approaches. However, it introduces more challenges, such as noisy skeleton data, losing important pixel information, and not having enriched enough features. These problems are exacerbated by a lack of anomaly detection datasets that are good enough representatives of real-world scenarios. In this work, we analyze and quantify the characteristics of two well-known video anomaly datasets to better understand the difficulties of pose-based anomaly detection. We take a step forward, exploring the discriminating power of pose and trajectory for video anomaly detection and their effectiveness based on context. We believe these experiments are beneficial for a better comprehension of pose-based anomaly detection and the datasets currently available. This will aid researchers in tackling the task of anomaly detection with a more lucid perspective, accelerating the development of robust models with better performance.
翻译:基于姿态的异常检测是一种通过分析从视频帧中提取的人体姿态来检测异常事件或行为的视频分析技术。利用姿态数据可缓解隐私与伦理问题,同时在计算层面,基于姿态的模型复杂度低于基于像素的方法。然而,该方法也带来了更多挑战,例如骨架数据存在噪声、丢失重要像素信息以及特征丰富度不足。由于缺乏能充分代表真实场景的异常检测数据集,这些问题进一步加剧。本文通过分析并量化两个知名视频异常数据集的特征,以更深入地理解基于姿态的异常检测所面临的困难。我们进一步探究姿态与轨迹在视频异常检测中的判别能力及其基于上下文的有效性。我们相信,这些实验有助于更全面地理解基于姿态的异常检测及当前可用数据集,从而帮助研究者以更清晰的视角应对异常检测任务,加速开发性能更优的鲁棒模型。