Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors.
翻译:口语问答(SQA)对于机器通过查找给定口语段落中的答案片段来回复用户问题至关重要。此前,SQA已在无需自动语音识别(ASR)的情况下实现,以避免识别错误和词汇外(OOV)问题。然而,开放域SQA(openSQA)这一现实问题从未被考虑过——在此类问题中,机器还需要首先从语音档案中检索可能包含答案的段落。本文提出了首个已知的端到端框架——语音稠密段落检索器(SpeechDPR),用于解决openSQA问题中的检索组件。SpeechDPR通过从无监督ASR(UASR)与文本稠密检索器(TDR)的级联模型中蒸馏知识,学习句子级语义表示。该方法无需任何人工转录的语音数据。初步实验表明,其性能与UASR和TDR的级联模型相当,且在UASR性能较差时显著更优,验证了该方法对语音识别错误具有更强的鲁棒性。