Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge base, which is essential for both open-domain QA and domain-specific QA, especially when language models alone cannot provide all the knowledge needed. Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual knowledge from knowledge graphs (KG) to answer complex questions, a bottleneck exists in effectively fusing the representations from PLMs and KGs because of (i) the semantic and distributional gaps between them, and (ii) the difficulties in joint reasoning over the provided knowledge from both modalities. To address the above two problems, we propose a Fine-grained Two-stage training framework (FiTs) to boost the KAQA system performance: The first stage aims at aligning representations from the PLM and the KG, thus bridging the modality gaps between them, named knowledge adaptive post-training. The second stage, called knowledge-aware fine-tuning, aims to improve the model's joint reasoning ability based on the aligned representations. In detail, we fine-tune the post-trained model via two auxiliary self-supervised tasks in addition to the QA supervision. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMILE) domains.
翻译:知识感知问答(KAQA)要求模型基于知识库回答问题,这对于开放域问答和特定领域问答都至关重要,尤其是在仅凭语言模型无法提供全部所需知识时。尽管近期KAQA系统取得了显著成果——这些系统倾向于整合预训练语言模型(PLM)的语言知识与知识图谱(KG)的事实知识来回答复杂问题,但在有效融合PLM与KG的表示方面仍存在瓶颈,原因在于:(i)两者之间存在语义与分布差异;(ii)难以对来自两种模态的提供知识进行联合推理。为解决上述两个问题,我们提出细粒度两阶段训练框架(FiTs)以提升KAQA系统性能:第一阶段旨在对齐PLM与KG的表示,从而弥合它们之间的模态差异,称为知识自适应后训练;第二阶段称为知识感知微调,旨在基于对齐后的表示提升模型的联合推理能力。具体而言,我们在问答监督之外,通过两个辅助自监督任务对后训练模型进行微调。大量实验表明,我们的方法在常识推理(即CommonsenseQA、OpenbookQA)与医学问答(即MedQA-USMILE)领域的三个基准测试中达到了最先进的性能。