Human emotion synthesis is a crucial aspect of affective computing. It involves using computational methods to mimic and convey human emotions through various modalities, with the goal of enabling more natural and effective human-computer interactions. Recent advancements in generative models, such as Autoencoders, Generative Adversarial Networks, Diffusion Models, Large Language Models, and Sequence-to-Sequence Models, have significantly contributed to the development of this field. However, there is a notable lack of comprehensive reviews in this field. To address this problem, this paper aims to address this gap by providing a thorough and systematic overview of recent advancements in human emotion synthesis based on generative models. Specifically, this review will first present the review methodology, the emotion models involved, the mathematical principles of generative models, and the datasets used. Then, the review covers the application of different generative models to emotion synthesis based on a variety of modalities, including facial images, speech, and text. It also examines mainstream evaluation metrics. Additionally, the review presents some major findings and suggests future research directions, providing a comprehensive understanding of the role of generative technology in the nuanced domain of emotion synthesis.
翻译:人类情感合成是情感计算的一个关键方面。它涉及使用计算方法通过多种模态模仿和传达人类情感,旨在实现更自然、更有效的人机交互。近年来,生成模型的进展,如自编码器、生成对抗网络、扩散模型、大语言模型和序列到序列模型,极大地推动了该领域的发展。然而,该领域明显缺乏全面的综述。为解决这一问题,本文旨在通过提供一个基于生成模型的人类情感合成最新进展的彻底且系统的概述,来填补这一空白。具体而言,本综述将首先介绍综述方法、涉及的情感模型、生成模型的数学原理以及使用的数据集。然后,综述涵盖了不同生成模型在基于多种模态(包括面部图像、语音和文本)的情感合成中的应用。它还探讨了主流的评估指标。此外,本综述提出了一些主要发现并建议了未来的研究方向,为全面理解生成技术在情感合成这一精细领域中的作用提供了参考。