Filled pauses (or fillers), such as "uh" and "um", are frequent in spontaneous speech and can serve as a turn-holding cue for the listener, indicating that the current speaker is not done yet. In this paper, we use the recently proposed Voice Activity Projection (VAP) model, which is a deep learning model trained to predict the dynamics of conversation, to analyse the effects of filled pauses on the expected turn-hold probability. The results show that, while filled pauses do indeed have a turn-holding effect, it is perhaps not as strong as could be expected, probably due to the redundancy of other cues. We also find that the prosodic properties and position of the filler has a significant effect on the turn-hold probability. However, contrary to what has been suggested in previous work, there is no difference between "uh" and "um" in this regard.
翻译:填充停顿(或填充词),如“uh”和“um”,在自发语音中频繁出现,可作为听话者的话轮保持线索,表明当前说话者尚未完成发言。本文利用近期提出的语音活动投影(VAP)模型——一种旨在预测对话动态的深度学习模型——分析填充词对预期话轮保持概率的影响。结果表明,虽然填充词确实具有话轮保持效果,但其强度可能不如预期,这可能是由于其他线索的冗余性。我们还发现,填充词的韵律特征和位置对话轮保持概率有显著影响。然而,与先前研究提出的观点相反,在这一点上,“uh”和“um”之间并无差异。