Visual Question Answering (VQA) systems are known for their poor performance in out-of-distribution datasets. An issue that was addressed in previous works through ensemble learning, answer re-ranking, or artificially growing the training set. In this work, we show for the first time that robust Visual Question Answering is attainable by simply enhancing the training strategy. Our proposed approach, Task Progressive Curriculum Learning (TPCL), breaks the main VQA problem into smaller, easier tasks based on the question type. Then, it progressively trains the model on a (carefully crafted) sequence of tasks. We further support the method by a novel distributional-based difficulty measurer. Our approach is conceptually simple, model-agnostic, and easy to implement. We demonstrate TPCL effectiveness through a comprehensive evaluation on standard datasets. Without either data augmentation or explicit debiasing mechanism, it achieves state-of-the-art on VQA-CP v2, VQA-CP v1 and VQA v2 datasets. Extensive experiments demonstrate that TPCL outperforms the most competitive robust VQA approaches by more than 5% and 7% on VQA-CP v2 and VQA-CP v1; respectively. TPCL also can boost VQA baseline backbone performance by up to 28.5%.
翻译:视觉问答系统在分布外数据集上表现不佳的问题已广为人知。先前的研究通过集成学习、答案重排序或人工扩展训练集等方式对此进行了处理。本工作中,我们首次证明仅通过改进训练策略即可实现鲁棒的视觉问答。我们提出的方法——任务渐进式课程学习,将主要的视觉问答问题依据问题类型分解为更小、更简单的任务,随后在精心设计的任务序列上对模型进行渐进式训练。我们进一步通过一种新颖的基于分布的难度度量器来支撑该方法。我们的方法概念简单、模型无关且易于实现。我们通过在标准数据集上的综合评估证明了任务渐进式课程学习的有效性。在不使用数据增强或显式去偏机制的情况下,该方法在VQA-CP v2、VQA-CP v1和VQA v2数据集上均达到了最先进的性能。大量实验表明,任务渐进式课程学习在VQA-CP v2和VQA-CP v1数据集上分别以超过5%和7%的优势超越了最具竞争力的鲁棒视觉问答方法。此外,任务渐进式课程学习可将视觉问答基线骨干模型的性能提升高达28.5%。