Socially Interactive Agents (SIAs) are physical or virtual embodied agents that display similar behavior as human multimodal behavior. Modeling SIAs' non-verbal behavior, such as speech and facial gestures, has always been a challenging task, given that a SIA can take the role of a speaker or a listener. A SIA must emit appropriate behavior adapted to its own speech, its previous behaviors (intra-personal), and the User's behaviors (inter-personal) for both roles. We propose AMII, a novel approach to synthesize adaptive facial gestures for SIAs while interacting with Users and acting interchangeably as a speaker or as a listener. AMII is characterized by modality memory encoding schema - where modality corresponds to either speech or facial gestures - and makes use of attention mechanisms to capture the intra-personal and inter-personal relationships. We validate our approach by conducting objective evaluations and comparing it with the state-of-the-art approaches.
翻译:社交交互体(SIAs)是具备类似人类多模态行为的物理或虚拟具身化智能体。由于SIAs可承担说话者或倾听者角色,建模其非言语行为(如语音与面部手势)始终是一项具有挑战性的任务。SIAs必须针对两种角色,根据自身语音、先前行为(个体内)以及用户行为(人际)发出恰当行为。我们提出AMII——一种新颖方法,用于合成SIAs在与用户交互并交替充当说话者或倾听者时的自适应面部手势。AMII的特点在于采用模态记忆编码模式(其中模态对应语音或面部手势),并利用注意力机制捕获个体内与人际关系。我们通过客观评估及与当前最优方法的对比验证了该方法的有效性。