Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.
翻译:不流畅现象(即正常言语流中的中断)在口语语篇中普遍存在。填充词("呃"、"嗯")是各类不流畅现象中出现频率最高的。然而,据我们所知,目前尚缺少能整合影响口语语言理解(SLU)对这类言语事件的研究视角的综合性资源。本文旨在以整体性方式综述多维度视角:从基础(心理)语言学理论分析,到其在自动语音识别(ASR)与SLU系统中的标注与处理,最后从生成角度进行探究。本文力求以易于理解的方式向SLU与对话式人工智能学界呈现这些视角,并探讨我们识别的各领域未来趋势与挑战。