This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet\footnote{https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021} dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.
翻译:本文探讨了采用课程学习方法训练的多模态深度学习模型在灾后分析中的应用。研究灾后分析至关重要,因为它通过提供关于灾害损失程度与资源分配的及时准确洞察,在减轻灾害影响方面发挥着关键作用。我们提出了一种课程学习策略以提升多模态深度学习模型的性能。课程学习通过让深度学习模型在复杂度递增的数据上进行训练,模拟人类教育中的渐进式学习过程。我们的主要目标是开发一个基于课程训练的多模态深度学习模型,特别关注能够联合处理图像与文本数据的视觉问答,以及利用FloodNet数据集进行灾害分析的语义分割。为实现此目标,采用U-Net模型进行语义分割与图像编码,并使用定制的文本分类器完成视觉问答任务。现有课程学习方法依赖于人工定义的难度函数。我们提出了一种名为动态任务与权重优先级的新型课程学习方法,该方法利用基于梯度的技术自动确定课程学习训练过程中的任务难度,从而消除了显式难度计算的需求。将动态任务与权重优先级集成到我们的多模态模型中,显著提升了视觉问答性能。源代码已发布于https://github.com/fualsan/DATWEP。