Response timing judgment is a critical component of interactive speech agents. Although there exists substantial prior work on turn modeling and voice wake-up, there is a lack of research on response timing judgments continuously aligned with user intent. To address this, we propose the Tap-to-Adapt framework, which enables users to naturally activate or interrupt the agent via tap interactions to construct online learning labels for response timing models. Under this framework, Dilated TCN and a sequential replay strategy play significant roles, as demonstrated through data-driven experiments and user studies. Additionally, we develop an evaluation and continuous data mining system tailored for the Tap-to-Adapt framework, through which we have collected approximately 20,000 samples from the user studies involving 20 participants.
翻译:响应时序判断是交互式语音代理的关键组成部分。尽管先前在话轮建模和语音唤醒方面已有大量研究,但针对持续与用户意图对齐的响应时序判断仍缺乏深入探讨。为此,我们提出轻触自适应框架,该框架允许用户通过轻触交互自然地激活或中断代理,从而为响应时序模型构建在线学习标签。在此框架下,扩张时序卷积网络与顺序回放策略发挥了重要作用,数据驱动实验和用户研究均验证了其有效性。此外,我们开发了专为轻触自适应框架设计的评估与持续数据挖掘系统,通过该系统已从20名参与者的用户研究中收集了约20,000个样本。