In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding spoken question answering capabilities of abstractive QA. Our code is available at \href{https://voidful.github.io/GSQA}{https://voidful.github.io/GSQA}
翻译:在口语问答(QA)的最新进展中,端到端模型取得了显著突破。然而,先前的研究主要集中于抽取式跨度选择。当答案直接存在于输入中时,这种基于抽取的方法效果显著,但在处理抽象式问题时则存在不足——此类问题的答案并非直接提取,而是需要从给定信息中推断得出。为弥补这一缺口,我们首次提出端到端生成式口语问答(GSQA)模型,使系统能够进行抽象推理。训练GSQA模型的挑战在于缺乏口语抽象式问答数据集。我们提出利用文本模型进行初始化,并借助抽取式问答数据集,将文本生成模型的知识迁移至口语生成模型。实验结果表明,在抽取式问答数据集上,我们的模型比先前抽取式模型性能提升3%。此外,GSQA模型仅基于口语抽取式问答数据集进行微调。尽管未接触到任何口语抽象式问答数据,其性能仍可接近级联模型。总之,我们的GSQA模型展现出了泛化至广泛问题的潜力,从而进一步扩展了口语问答在抽象式问答中的能力。我们的代码开源地址为:\href{https://voidful.github.io/GSQA}{https://voidful.github.io/GSQA}