SSIVD-Net: A Novel Salient Super Image Classification & Detection Technique for Weaponized Violence

Detection of violence and weaponized violence in closed-circuit television (CCTV) footage requires a comprehensive approach. In this work, we introduce the \emph{Smart-City CCTV Violence Detection (SCVD)} dataset, specifically designed to facilitate the learning of weapon distribution in surveillance videos. To tackle the complexities of analyzing 3D surveillance video for violence recognition tasks, we propose a novel technique called, \emph{SSIVD-Net} (\textbf{S}alient-\textbf{S}uper-\textbf{I}mage for \textbf{V}iolence \textbf{D}etection). Our method reduces 3D video data complexity, dimensionality, and information loss while improving inference, performance, and explainability through the use of Salient-Super-Image representations. Considering the scalability and sustainability requirements of futuristic smart cities, the authors introduce the \emph{Salient-Classifier}, a novel architecture combining a kernelized approach with a residual learning strategy. We evaluate variations of SSIVD-Net and Salient Classifier on our SCVD dataset and benchmark against state-of-the-art (SOTA) models commonly employed in violence detection. Our approach exhibits significant improvements in detecting both weaponized and non-weaponized violence instances. By advancing the SOTA in violence detection, our work offers a practical and scalable solution suitable for real-world applications. The proposed methodology not only addresses the challenges of violence detection in CCTV footage but also contributes to the understanding of weapon distribution in smart surveillance. Ultimately, our research findings should enable smarter and more secure cities, as well as enhance public safety measures.

翻译：在闭路电视（CCTV）监控画面中检测暴力行为及武装暴力需要综合性的方法。本文引入了专为促进监控视频中武器分布学习而设计的《智慧城市CCTV暴力检测（SCVD）》数据集。为应对三维监控视频在暴力识别任务中的分析复杂性，我们提出了一种名为SSIVD-Net（显著超图像暴力检测网络）的新颖技术。该方法通过使用显著超图像表示，降低了三维视频数据的复杂度、维度与信息损失，同时提升了推理效率、性能表现与可解释性。考虑到未来智慧城市的可扩展性与可持续性需求，作者提出了集成核化方法与残差学习策略的新型架构——显著分类器。我们在SCVD数据集上评估了SSIVD-Net与显著分类器的多种变体，并与暴力检测领域常用的最先进（SOTA）模型进行了基准对比。本方法在识别武装与非武装暴力实例方面均展现出显著改进。通过推动暴力检测SOTA发展，我们的工作为实际应用提供了实用且可扩展的解决方案。所提出的方法论不仅解决了CCTV画面中暴力检测的挑战，还增进了对智能监控中武器分布的理解。最终，本研究有望助力构建更智能、更安全的城市，并增强公共安全防护措施。