Current Spoken Dialogue Systems (SDSs) often serve as passive listeners that respond only after receiving user speech. To achieve human-like dialogue, we propose a novel future prediction architecture that allows an SDS to anticipate future affective reactions based on its current behaviors before the user speaks. In this work, we investigate two scenarios: speech and laughter. In speech, we propose to predict the user's future emotion based on its temporal relationship with the system's current emotion and its causal relationship with the system's current Dialogue Act (DA). In laughter, we propose to predict the occurrence and type of the user's laughter using the system's laughter behaviors in the current turn. Preliminary analysis of human-robot dialogue demonstrated synchronicity in the emotions and laughter displayed by the human and robot, as well as DA-emotion causality in their dialogue. This verifies that our architecture can contribute to the development of an anticipatory SDS.
翻译:当前的语音对话系统通常作为被动倾听者,仅在接收到用户语音后作出响应。为实现类人对话,我们提出了一种新颖的未来预测架构,使语音对话系统能够在用户发言前,基于其当前行为预测未来的情感反应。本研究中,我们探讨了两种情境:言语与笑声。在言语方面,我们提出基于用户未来情绪与系统当前情绪的时间关联性,及其与系统当前对话行为的因果关系,来预测用户未来情绪。在笑声方面,我们提出利用系统当前轮次的笑声行为来预测用户笑声的发生与类型。对人机对话的初步分析表明,人类与机器人所展现的情绪和笑声存在同步性,且其对话中存在对话行为-情绪的因果关联。这验证了我们的架构可推动预见性语音对话系统的发展。