Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

from arxiv, Project Page: https://rendchevi.github.io/daisy-tts; Updates: (1) Fixed typos, missing references, and layout, (2) Revise explanation on emotion classifier or discriminator

We often verbally express emotions in a multifaceted manner, they may vary in their intensities and may be expressed not just as a single but as a mixture of emotions. This wide spectrum of emotions is well-studied in the structural model of emotions, which represents variety of emotions as derivative products of primary emotions with varying degrees of intensity. In this paper, we propose an emotional text-to-speech design to simulate a wider spectrum of emotions grounded on the structural model. Our proposed design, Daisy-TTS, incorporates a prosody encoder to learn emotionally-separable prosody embedding as a proxy for emotion. This emotion representation allows the model to simulate: (1) Primary emotions, as learned from the training samples, (2) Secondary emotions, as a mixture of primary emotions, (3) Intensity-level, by scaling the emotion embedding, and (4) Emotions polarity, by negating the emotion embedding. Through a series of perceptual evaluations, Daisy-TTS demonstrated overall higher emotional speech naturalness and emotion perceiveability compared to the baseline.

翻译：我们通常以多层面的方式口头表达情感，这些情感可能在强度上有所不同，并且可能不仅仅是单一情感，而是多种情感的混合。这种广泛的情感频谱在情感结构模型中得到了深入研究，该模型将各种情感表示为不同强度基本情感的衍生产物。在本文中，我们提出了一种基于情感结构模型的、旨在模拟更广泛情感频谱的情感文本转语音设计方案。我们提出的设计Daisy-TTS，包含一个韵律编码器，用于学习情感可分离的韵律嵌入作为情感的代理表示。这种情感表示使模型能够模拟：（1）从训练样本中学到的基本情感，（2）作为基本情感混合的次级情感，（3）通过缩放情感嵌入实现的强度级别，以及（4）通过取反情感嵌入实现的情感极性。通过一系列感知评估，与基线模型相比，Daisy-TTS在情感语音自然度和情感可感知性方面总体上表现出更高的水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日