Automated visual firearms classification from RGB images is an important real-world task with applications in public space security, intelligence gathering and law enforcement investigations. When applied to images massively crawled from the World Wide Web (including social media and dark Web sites), it can serve as an important component of systems that attempt to identify criminal firearms trafficking networks, by analyzing Big Data from open-source intelligence. Deep Neural Networks (DNN) are the state-of-the-art methodology for achieving this, with Convolutional Neural Networks (CNN) being typically employed. The common transfer learning approach consists of pretraining on a large-scale, generic annotated dataset for whole-image classification, such as ImageNet-1k, and then finetuning the DNN on a smaller, annotated, task-specific, downstream dataset for visual firearms classification. Neither Visual Transformer (ViT) neural architectures nor Self-Supervised Learning (SSL) approaches have been so far evaluated on this critical task..
翻译:从RGB图像中自动分类枪支是一项重要的现实世界任务,在公共空间安全、情报搜集和执法调查中具有应用价值。当应用于大规模网络爬取(包括社交媒体和暗网网站)的图像时,通过分析开源情报大数据,这类技术可作为识别犯罪枪支贩运网络系统的关键组成部分。深度神经网络(DNN)是实现该目标的最先进方法,其中卷积神经网络(CNN)的应用最为普遍。常见的迁移学习方法是在大规模通用标注数据集(如ImageNet-1k)上进行全图分类预训练,随后在较小标注的任务特定下游数据集上微调DNN以实现枪支视觉分类。值得注意的是,视觉Transformer(ViT)神经网络架构与自监督学习(SSL)方法尚未在该关键任务中得到系统评估。