Controlled AutoEncoders to Generate Faces from Voices

Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice recordings. We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval. Various experiments demonstrate the effectiveness of our proposed model.

翻译：过去多项研究表明,人的声音特征和面部特征之间有着很强的关联性。然而,现有的方法只是从声音产生面孔,而没有探讨有助于这些观察到的关联性的一组特征。可以将一个探讨这一问题的计算方法改写为:“一个目标面孔需要改变多少才能被视为源声音的发源人?”从这个角度出发,我们提出了一个框架,在对一个特定声音作出反应时使一个目标面孔发生变化,其方式是面部特征以本文中学到的语音相关性为暗含指导。我们的框架包括一个导引的自动编码,将一个面部转换为另一个面部,由一种独特的模型-调节器控制,叫做“定位控制器,根据输入语音记录来改变重塑的面孔。我们通过人类主题和面部检索来评估VoxCelab和VGGFace数据集的框架。各种实验都证明了我们提议的模型的有效性。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

434+阅读 · 2021年1月11日

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

43+阅读 · 2020年7月27日

【微软】深度学习概述，65页ppt，A gentle introduction to Deep Learning

专知会员服务

66+阅读 · 2020年5月17日

生成式对抗网络GAN在计算机视觉中的应用概述，GANs in computer vision: Introduction to generative learning（part1）

专知会员服务

64+阅读 · 2020年4月19日