Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.
翻译:在人工智能领域,超越领域内经验实现分布外数据泛化具有至关重要的意义。近期,最先进的视觉问答(VQA)模型在领域内数据上展现出令人瞩目的性能,这在一定程度上归因于语言先验偏差,然而该偏差在实践中却削弱了模型的泛化能力。本文尝试通过实证研究视角,为语言模态对VQA性能的影响提供新见解。为此,我们基于六个模型开展系列实验。实验结果表明:1) 除问题类型引发的先验偏差外,后缀相关偏差对诱导偏差具有显著影响;2) 采用词序变体问题训练VQA模型,可提升其在分布外基准测试上的表现,其中LXMERT模型在未使用任何去偏方法的情况下甚至获得10个百分点的性能提升。我们深入剖析了这些实验结果的内在成因,并提出若干简化方案以降低模型对语言先验的依赖性。实验结果证实,所提方法在提升分布外基准测试VQA-CPv2性能方面具有有效性。期望本研究能为未来设计偏差缓解方法的研究带来创新启示。