Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation

Speech modeling methods learn one embedding for a fixed segment of speech, typically in between 10-25 ms. The information present in speech can be divided into two categories: "what is being said" (content) and "how it is expressed" (other) and these two are orthogonal in nature causing the optimization algorithm to find a sub-optimal solution if forced to optimize together. This leads to sub-optimal performance in one or all downstream tasks as shown by previous studies. Current self-supervised learning (SSL) methods such as HuBERT are very good at modeling the content information present in speech. Data augmentation improves the performance on tasks which require effective modeling of other information but this leads to a divided capacity of the model. In this work, we conduct a preliminary study to understand the importance of modeling other information using separate learnable parameters. We propose a modified version of HuBERT, termed Other HuBERT (O-HuBERT), to test our hypothesis. Our findings are twofold: first, the O-HuBERT method is able to utilize all layers to build complex features to encode other information; second, a robust data augmentation strategy is essential for learning the information required by tasks that depend on other information and to achieve state-of-the-art (SOTA) performance on the SUPERB benchmark with a similarly sized model (100 million parameters) and pre-training data (960 hours).

翻译：语音建模方法通常学习10-25毫秒固定语音片段的单一嵌入表征。语音信息可分为两类："所言内容"（内容信息）与"表达方式"（其他信息），二者本质正交，若强制共同优化将导致优化算法陷入次优解。如先前研究所示，这会造成下游任务中一项或全部任务的性能欠佳。当前自监督学习方法（如HuBERT）在建模语音内容信息方面表现优异。数据增强能提升依赖其他信息有效建模的任务性能，但会导致模型容量分配割裂。本研究通过初步实验探讨使用独立可学习参数建模其他信息的重要性，提出改进版HuBERT（称为Other HuBERT，O-HuBERT）验证假设。研究发现：首先，O-HuBERT方法能利用全部网络层级构建复杂特征以编码其他信息；其次，鲁棒的数据增强策略对学习依赖其他信息的任务所需表征至关重要，在模型参数量（1亿）与预训练数据（960小时）规模相近的条件下，该方法能在SUPERB基准测试中达到最先进性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日