Voice, as input, has progressively become popular on mobiles and seems to transcend almost entirely text input. Through voice, the voice search (VS) system can provide a more natural way to meet user's information needs. However, errors from the automatic speech recognition (ASR) system can be catastrophic to the VS system. Building on the recent advanced lightweight autoregressive retrieval model, which has the potential to be deployed on mobiles, leading to a more secure and personal VS assistant. This paper presents a novel study of VS leveraging autoregressive retrieval and tackles the crucial problems facing VS, viz. the performance drop caused by ASR noise, via data augmentations and contrastive learning, showing how explicit and implicit modeling the noise patterns can alleviate the problems. A series of experiments conducted on the Open-Domain Question Answering (ODSQA) confirm our approach's effectiveness and robustness in relation to some strong baseline systems.
翻译:摘要:语音输入在移动设备上日益普及,其使用率似已全面超越文本输入。通过语音交互,语音搜索(VS)系统能够以更自然的方式满足用户的信息需求。然而,自动语音识别(ASR)系统产生的错误可能对VS系统造成灾难性影响。本文基于近期先进的轻量级自回归检索模型(具备移动端部署潜力,可构建更安全且个性化的语音搜索助手),首次创新性地研究了基于自回归检索的语音搜索系统,并着力解决语音搜索面临的核心难题——即ASR噪声导致的性能下降问题。通过数据增强与对比学习技术,我们展示了显式与隐式噪声模式建模如何缓解该问题。在开放域问答(ODSQA)数据集上开展的一系列实验证实,相比多个强基线系统,本方法具有显著的有效性和鲁棒性。