Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling - 专知论文

会员服务 ·

0

离散化 · 稳健性 · 语言模型化 · MoDELS · 表示 ·

2023 年 5 月 29 日

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

翻译：面向生成式口语语言建模的增强不变性离散表示

Itai Gat,Felix Kreuk,Tu Anh Nguyen,Ann Lee,Jade Copet,Gabriel Synnaeve,Emmanuel Dupoux,Yossi Adi

Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensively investigated. This work focuses on improving the robustness of discrete input representations for generative spoken language modeling. First, we formally define how to measure the robustness of such representations to various signal variations that do not alter the spoken information (e.g., time-stretch). Next, we empirically demonstrate how current state-of-the-art representation models lack robustness to such variations. To overcome this, we propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. Our method significantly improves over the evaluated baselines when considering encoding and modeling metrics. We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines.

翻译：生成式口语语言建模研究侧重于在无需任何文本监督的情况下，利用原始音频录音优化语音语言模型。这类语音语言模型通常基于从自监督模型内部表示的量化中获得的离散单元运行。尽管这些单元展现出令人印象深刻的建模效果，但其鲁棒性能力尚未得到广泛研究。本文聚焦于提升生成式口语语言建模中离散输入表示的鲁棒性。首先，我们正式定义了如何衡量此类表示对各种不改变口语信息的信号变化（例如时间拉伸）的鲁棒性。接着，我们通过实验证明了当前最先进的表示模型对此类变化缺乏鲁棒性。为克服这一问题，我们提出了一种高效且有效的方法，用于学习面向生成式口语语言建模的鲁棒离散语音表示。所提出的方法基于对语音信号应用一组信号变换，并通过迭代伪标签方案优化模型。在编码与建模指标方面，我们的方法显著优于所评估的基线模型。此外，我们在语音到语音翻译任务上评估了该方法（考虑西班牙语-英语和法语-英语翻译），结果表明所提出的方法优于所评估的基线模型。

0

相关内容

离散化

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

黎曼流形上 Ricci 曲率的几何

国家自然科学基金

3+阅读 · 2015年12月31日

组蛋白甲基转移酶SET1家族蛋白活性调控的结构基础

国家自然科学基金

0+阅读 · 2014年12月31日

黎曼流形的曲率与拓扑关系研究

国家自然科学基金

3+阅读 · 2013年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Takagi-Sugeno模型的非线性系统数据驱动最优控制方法适定性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

黎曼流形上几何与拓扑的若干研究

国家自然科学基金

1+阅读 · 2012年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

积分几何与凸几何分析

国家自然科学基金

2+阅读 · 2009年12月31日

自生成多面体空间结构的几何构成与应用技术的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Improving Text Semantic Similarity Modeling through a 3D Siamese Network

Arxiv

0+阅读 · 2023年7月18日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Graph Contrastive Learning with Adaptive Augmentation

Arxiv

10+阅读 · 2021年2月26日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

VIP会员

文章信息

相关主题

语言模型化

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

1+阅读 · 今天15:30

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

1+阅读 · 今天15:27

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

0+阅读 · 今天15:00

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

0+阅读 · 今天14:55

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

1+阅读 · 今天8:28

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

6+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

6+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

4+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

7+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

7+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Improving Text Semantic Similarity Modeling through a 3D Siamese Network

Arxiv

0+阅读 · 2023年7月18日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Graph Contrastive Learning with Adaptive Augmentation

Arxiv

10+阅读 · 2021年2月26日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

相关基金

黎曼流形上 Ricci 曲率的几何

国家自然科学基金

3+阅读 · 2015年12月31日

组蛋白甲基转移酶SET1家族蛋白活性调控的结构基础

国家自然科学基金

0+阅读 · 2014年12月31日

黎曼流形的曲率与拓扑关系研究

国家自然科学基金

3+阅读 · 2013年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Takagi-Sugeno模型的非线性系统数据驱动最优控制方法适定性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

黎曼流形上几何与拓扑的若干研究

国家自然科学基金

1+阅读 · 2012年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

积分几何与凸几何分析

国家自然科学基金

2+阅读 · 2009年12月31日

自生成多面体空间结构的几何构成与应用技术的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员