In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.
翻译:在紧急事件发生时,应急响应机构需要快速准确地评估现场情况,以便调配相关服务和资源。然而,由于受灾区数据在地方应急服务提供第一手报告前往往十分匮乏,当局常需基于有限信息做出决策。幸运的是,配备高质量摄像头的智能手机的广泛普及,使得通过社交媒体开展的公民新闻成为危机响应人员的重要信息来源。但分析公民发布的海量图像所需的时间与精力远超常规负荷。为解决此问题,本文提出采用最先进的深度神经网络模型实现自动图像分类/标注,具体而言是通过改进基于Transformer的架构用于危机图像分类(CrisisViT)。我们利用新的Incidents1M危机图像数据集,开发了一系列新型Transformer图像分类模型。在标准危机图像基准数据集上的实验表明,CrisisViT模型在灾害类型、图像相关性、人道主义类别及损害严重程度分类任务上均显著优于此前方法。此外,我们证实新的Incidents1M数据集能够进一步强化CrisisViT模型,带来额外1.25%的绝对准确率提升。