meta4: semantically-aligned generation of metaphoric gestures using self-supervised text and speech representation

Image Schemas are repetitive cognitive patterns that influence the way we conceptualize and reason about various concepts present in speech. These patterns are deeply embedded within our cognitive processes and are reflected in our bodily expressions including gestures. Particularly, metaphoric gestures possess essential characteristics and semantic meanings that align with Image Schemas, to visually represent abstract concepts. The shape and form of gestures can convey abstract concepts, such as extending the forearm and hand or tracing a line with hand movements to visually represent the image schema of PATH. Previous behavior generation models have primarily focused on utilizing speech (acoustic features and text) to drive the generation model of virtual agents. They have not considered key semantic information as those carried by Image Schemas to effectively generate metaphoric gestures. To address this limitation, we introduce META4, a deep learning approach that generates metaphoric gestures from both speech and Image Schemas. Our approach has two primary goals: computing Image Schemas from input text to capture the underlying semantic and metaphorical meaning, and generating metaphoric gestures driven by speech and the computed image schemas. Our approach is the first method for generating speech driven metaphoric gestures while leveraging the potential of Image Schemas. We demonstrate the effectiveness of our approach and highlight the importance of both speech and image schemas in modeling metaphoric gestures.

翻译：摘要：意象图式是反复出现的认知模式，影响着我们概念化并推理语音中各类概念的方式。这些模式深植于认知过程，并在包括手势在内的身体表达中得以体现。特别是，隐喻手势具有与意象图式一致的关键特征和语义内涵，能够视觉化地呈现抽象概念。手势的形状与形态可传递抽象概念，例如，通过伸展前臂与手掌，或用手部动作描画线条，来视觉化呈现“路径”意象图式。以往的行为生成模型主要聚焦于利用语音（声学特征与文本）驱动虚拟代理的生成模型，并未考虑意象图式所承载的关键语义信息，以有效生成隐喻手势。为弥补这一不足，我们提出META4，一种从语音与意象图式中生成隐喻手势的深度学习方法。该方法包含两个核心目标：从输入文本中计算意象图式，以捕获潜在语义与隐喻含义；以及基于语音与计算所得的意象图式生成隐喻手势。本方法是首个在利用意象图式潜力的同时，生成语音驱动隐喻手势的方法。我们展示了该方法的有效性，并强调了语音与意象图式在建模隐喻手势中的关键作用。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日