Image Schemas are repetitive cognitive patterns that influence the way we conceptualize and reason about various concepts present in speech. These patterns are deeply embedded within our cognitive processes and are reflected in our bodily expressions including gestures. Particularly, metaphoric gestures possess essential characteristics and semantic meanings that align with Image Schemas, to visually represent abstract concepts. The shape and form of gestures can convey abstract concepts, such as extending the forearm and hand or tracing a line with hand movements to visually represent the image schema of PATH. Previous behavior generation models have primarily focused on utilizing speech (acoustic features and text) to drive the generation model of virtual agents. They have not considered key semantic information as those carried by Image Schemas to effectively generate metaphoric gestures. To address this limitation, we introduce META4, a deep learning approach that generates metaphoric gestures from both speech and Image Schemas. Our approach has two primary goals: computing Image Schemas from input text to capture the underlying semantic and metaphorical meaning, and generating metaphoric gestures driven by speech and the computed image schemas. Our approach is the first method for generating speech driven metaphoric gestures while leveraging the potential of Image Schemas. We demonstrate the effectiveness of our approach and highlight the importance of both speech and image schemas in modeling metaphoric gestures.
翻译:图像图式是影响我们概念化及推理言语中多种概念的重复性认知模式。这类模式深植于认知过程,并通过手势等躯体表达得以呈现。尤其值得注意的是,隐喻手势具有与图像图式相契合的本质特征和语义内涵,可视觉化呈现抽象概念。手势的形态能够传递抽象概念,例如伸展前臂与手掌,或通过手部运动描绘线条以视觉化表征"路径"图式。现有行为生成模型主要依赖语音(声学特征与文本)驱动虚拟代理的生成,尚未考虑图像图式所承载的关键语义信息以有效生成隐喻手势。为突破这一局限,我们提出META4,这是一种从语音和图像图式中生成隐喻手势的深度学习方法。本方法具有双重目标:从输入文本中计算图像图式以捕捉潜在语义与隐喻含义,以及基于语音与计算所得的图像图式驱动隐喻手势生成。本方法首次在利用图像图式潜能的同时实现了语音驱动的隐喻手势生成。我们验证了该方法的高效性,并强调语音与图像图式在建模隐喻手势中的共同重要性。