The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training on multiple datasets or by combining multiple models. Despite the promising results of multi-dataset models, some domains or QA formats may require specific architectures, and thus the adaptability of these models might be limited. In addition, current approaches for combining models disregard cues such as question-answer compatibility. In this work, we propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores to select the best answer among a list of answer candidates. Through quantitative and qualitative experiments we show that our model i) creates a collaboration between agents that outperforms previous multi-agent and multi-dataset approaches in both in-domain and out-of-domain scenarios, ii) is highly data-efficient to train, and iii) can be adapted to any QA format. We release our code and a dataset of answer predictions from expert agents for 16 QA datasets to foster future developments of multi-agent systems on https://github.com/UKPLab/MetaQA.
翻译:近年来,问答数据集和模型的激增增加了对模型跨多种领域与格式泛化能力的关注,其实现途径包括在多个数据集上训练或结合多个模型。尽管多数据集模型取得了令人鼓舞的结果,但某些领域或问答格式可能需要特定架构,因此这些模型的适应性可能受限。此外,当前结合模型的方法忽略了诸如问答兼容性等线索。本研究提出了一种新颖、灵活且训练高效的架构,通过结合专家智能体来综合考虑问题、答案预测及答案预测置信度得分,从而在候选答案列表中选择最佳答案。通过定量与定性实验,我们展示了该模型:i) 能够在智能体间建立协作关系,在领域内与领域外场景中均优于以往的多智能体与多数据集方法;ii) 训练数据效率极高;iii) 可适用于任何问答格式。我们已发布代码及来自16个问答数据集的专家智能体答案预测数据集,以促进多智能体系统的未来发展,下载地址:https://github.com/UKPLab/MetaQA。