Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at \url{https://daksitha.github.io/ReNeLib}
翻译:摘要:对人类行为做出灵活且自然的非言语反应,仍是主要依赖手工规则生成动画的社交交互代理(SIAs)面临的一项挑战。尽管近期提出的基于机器学习的对话行为生成方法为应对这一挑战提供了有前景的途径,但此类方法尚未被应用于SIAs中。其原因主要在于缺乏一种能将这些方法集成至SIA框架、并满足人机交互场景中严苛实时性要求的软件工具包。本研究首次提出包含三大核心组件的此类工具包:(1)实时特征提取模块,用于捕捉用户的多模态社交线索;(2)基于最新先进神经网络方法的行为生成模块;(3)生成行为的可视化模块,支持基于FLAME和Apple ARKit的交互代理。我们对整个框架及其组件的实时性能进行了全面评估。此外,我们引入了从心理治疗会话中导出的预训练行为生成模型,用于领域特定的倾听行为生成。该软件工具包对实时部署与评估SIA倾听行为至关重要,现已开源。相关资源(包括代码、从治疗交互中提取的行为多模态特征)托管于 \url{https://daksitha.github.io/ReNeLib}。