SSIVD-Net: A Novel Salient Super Image Classification & Detection Technique for Weaponized Violence

Detection of violence and weaponized violence in closed-circuit television (CCTV) footage requires a comprehensive approach. In this work, we introduce the \emph{Smart-City CCTV Violence Detection (SCVD)} dataset, specifically designed to facilitate the learning of weapon distribution in surveillance videos. To tackle the complexities of analyzing 3D surveillance video for violence recognition tasks, we propose a novel technique called, \emph{SSIVD-Net} (\textbf{S}alient-\textbf{S}uper-\textbf{I}mage for \textbf{V}iolence \textbf{D}etection). Our method reduces 3D video data complexity, dimensionality, and information loss while improving inference, performance, and explainability through the use of Salient-Super-Image representations. Considering the scalability and sustainability requirements of futuristic smart cities, the authors introduce the \emph{Salient-Classifier}, a novel architecture combining a kernelized approach with a residual learning strategy. We evaluate variations of SSIVD-Net and Salient Classifier on our SCVD dataset and benchmark against state-of-the-art (SOTA) models commonly employed in violence detection. Our approach exhibits significant improvements in detecting both weaponized and non-weaponized violence instances. By advancing the SOTA in violence detection, our work offers a practical and scalable solution suitable for real-world applications. The proposed methodology not only addresses the challenges of violence detection in CCTV footage but also contributes to the understanding of weapon distribution in smart surveillance. Ultimately, our research findings should enable smarter and more secure cities, as well as enhance public safety measures.

翻译：在闭路电视（CCTV）监控视频中检测暴力和武装暴力行为需要一种综合性方法。本研究提出了专为学习监控视频中武器分布而设计的“智慧城市CCTV暴力检测（SCVD）”数据集。为解决三维监控视频在暴力识别任务中的复杂分析难题，我们提出了一种名为SSIVD-Net（显著超图像暴力检测网络）的新型技术。该方法通过采用显著超图像表征，在减少三维视频数据复杂度、维度与信息损失的同时，提升了推理效率、性能表现与可解释性。考虑到未来智慧城市的可扩展性与可持续性需求，作者提出了“显著分类器”——一种融合核化方法与残差学习策略的新型架构。我们在SCVD数据集上评估了SSIVD-Net及显著分类器的多种变体，并对比了暴力检测领域常用的当前最先进（SOTA）模型。我们的方法在检测武装与非武装暴力实例方面均展现出显著提升。通过推动暴力检测领域的SOTA发展，本研究为实际应用场景提供了兼具实用性与可扩展性的解决方案。所提出的方法不仅解决了CCTV监控视频中暴力检测的挑战，还深化了对智能监控中武器分布规律的理解。最终，我们的研究成果将助力构建更智能、更安全的城市，并提升公共安全防护能力。