Deep Variational Generative Models for Audio-visual Speech Separation - 专知论文

会员服务 ·

0

潜变量/隐变量 · 生成模型 · 分离的 · MoDELS · Better ·

2021 年 8 月 31 日

Deep Variational Generative Models for Audio-visual Speech Separation

翻译：视听语音隔音的深变生成模型

Viet-Nhat Nguyen,Mostafa Sadeghi,Elisa Ricci,Xavier Alameda-Pineda

from arxiv, Accepted to the 31st IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Oct. 25-28, 2021, Gold Coast, Queensland, Australia

In this paper, we are interested in audio-visual speech separation given a single-channel audio recording as well as visual information (lips movements) associated with each speaker. We propose an unsupervised technique based on audio-visual generative modeling of clean speech. More specifically, during training, a latent variable generative model is learned from clean speech spectrograms using a variational auto-encoder (VAE). To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data. The visual modality also serves as a prior for latent variables, through a visual network. At test time, the learned generative model (both for speaker-independent and speaker-dependent scenarios) is combined with an unsupervised non-negative matrix factorization (NMF) variance model for background noise. All the latent variables and noise parameters are then estimated by a Monte Carlo expectation-maximization algorithm. Our experiments show that the proposed unsupervised VAE-based method yields better separation performance than NMF-based approaches as well as a supervised deep learning-based technique.

翻译：在本文中,我们有兴趣通过单声道录音以及与每个发言者相关的视觉信息(短片运动)来进行视听语音分离。我们建议一种基于清洁言语的视听基因模型的不受监督的技术。更具体地说,在培训期间,利用变式自动电解码器(VAE)从清洁语音光谱中学习潜在的变异基因模型。为了更好地利用视觉信息,潜伏变量的后遗物是从混合语句(而不是干净的言语)和视觉数据中推断出来的。视觉模式也通过视觉网络作为潜在变量的先导。在测试时,学习的基因模型(对于依靠扬声器和依赖扬声器的情景)与背景噪音的非超超异变矩阵因子(NMF)变异模型相结合。为了更好地利用视觉信息,所有潜在变量和噪声参数随后由蒙特卡洛期望-摩西化算法估算。我们的实验显示,拟议的以未超式VAE为基础的方法比以NMF为基础的深层次学习技术产生更好的分离性。

0

相关内容

潜变量/隐变量

潜变量/隐变量

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【硬核课】CMU《多模态机器学习》2020课程，附课件与视频

【硬核课】CMU《多模态机器学习》2020课程，附课件与视频

专知会员服务

138+阅读 · 2020年9月3日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Conditional Variational Autoencoder for Learned Image Reconstruction

Arxiv

0+阅读 · 2021年10月25日

Speech Enhancement Based on Cyclegan with Noise-informed Training

Arxiv

0+阅读 · 2021年10月19日

A cappella: Audio-visual Singing Voice Separation

Arxiv

0+阅读 · 2021年10月18日

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

Arxiv

0+阅读 · 2021年10月18日

Multi-Modal Pre-Training for Automated Speech Recognition

Arxiv

0+阅读 · 2021年10月12日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Learning Implicit Fields for Generative Shape Modeling

Learning Implicit Fields for Generative Shape Modeling

Arxiv

11+阅读 · 2018年12月6日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

VIP会员

文章信息

相关主题

潜变量/隐变量

最新内容

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

0+阅读 · 今天15:55

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

0+阅读 · 今天15:53

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

11+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【硬核课】CMU《多模态机器学习》2020课程，附课件与视频

【硬核课】CMU《多模态机器学习》2020课程，附课件与视频

专知会员服务

138+阅读 · 2020年9月3日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

GNN跨域综述：从消息传递到图基础模型

巡飞弹与反无人机系统——现代战场的两大支柱

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

无人机自主控制与人工智能：系统性综述

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Conditional Variational Autoencoder for Learned Image Reconstruction

Arxiv

0+阅读 · 2021年10月25日

Speech Enhancement Based on Cyclegan with Noise-informed Training

Arxiv

0+阅读 · 2021年10月19日

A cappella: Audio-visual Singing Voice Separation

Arxiv

0+阅读 · 2021年10月18日

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

Arxiv

0+阅读 · 2021年10月18日

Multi-Modal Pre-Training for Automated Speech Recognition

Arxiv

0+阅读 · 2021年10月12日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Learning Implicit Fields for Generative Shape Modeling

Learning Implicit Fields for Generative Shape Modeling

Arxiv

11+阅读 · 2018年12月6日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

微信扫码咨询专知VIP会员