Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by 1) framing it as a text generation task, 2) grounding text generation additionally in the audio modality, and 3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.
翻译:手动为任务导向型对话(ToD)系统标注细粒度槽值标签是一项昂贵且耗时的工作,这促使研究者探索如何在标注数据有限的情况下实现槽填充方法。此外,当前大多数ToD研究仅基于文本作为输入模态,忽略了处理语音时自动语音识别(ASR)不完善带来的额外挑战。本文提出一种名为KA2G的知识增强音频驱动生成式槽填充框架,该框架专注于语音输入下ToD的小样本和零样本槽填充任务。KA2G通过以下方式实现基于语音的ToD鲁棒且数据高效的槽填充:1)将其建模为文本生成任务,2)额外将文本生成锚定到音频模态,3)利用可用外部知识(如预定义的候选槽值列表)进行条件约束。实验表明,在KA2G框架中融合两种模态可提升对ASR错误的鲁棒性;通过指针生成器机制实现的KA2G知识感知槽值生成器尤其促进了小样本和零样本学习。在标准单轮语音SLURP数据集及从商业ToD系统中提取的多轮数据集上的实验显示,相较于现有工作,该方法取得了持续且显著的性能提升,尤其在少样本和零样本设置中表现突出。