Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.
翻译:不流畅现象(即言语常规流畅性的中断)在口语语篇中普遍存在。与其他类型的不流畅现象相比,填充词("uh"、"um")出现频率最高。然而,据我们所知,目前尚无文献系统梳理影响口语语言理解(SLU)研究对这些言语事件处理的多学科视角。本文旨在全面综述多维度研究视角:从基础(心理)语言学理论,到其在自动语音识别(ASR)与SLU系统中的标注与处理,最后从生成维度展开分析。本文以通俗易懂的方式向SLU与会话式AI学界呈现这些视角,并就各领域未来发展趋势与挑战展开讨论。