Don't speak too fast: The impact of data bias on self-supervised speech models - 专知论文

会员服务 ·

0

有偏 · 有偏数据集 · MoDELS · FAST · Performer ·

2022 年 2 月 17 日

Don't speak too fast: The impact of data bias on self-supervised speech models

翻译：不要说得太快:数据偏差对自我监督的演讲模式的影响

Yen Meng,Yi-Hui Chou,Andy T. Liu,Hung-yi Lee

from arxiv, Accepted by ICASSP 2022

Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. Our experiments show that S3Ms have tolerance toward gender bias. Moreover, we find that the content of speech has little impact on the performance of S3Ms across downstream tasks, but S3Ms do show a preference toward a slower speech rate.

翻译：自我监督的演讲模式(S3Ms)在许多演讲的下游任务(如ASR)中被证明是成功的。然而,培训前数据如何影响S3Ms的下游行为仍然是一个尚未探讨的问题。在本文中,我们研究了培训前数据如何通过针对性别、内容和手工业等不同言论因素的有偏见的数据集的培训前模型影响S3Ms。我们实验显示,S3Ms对性别偏见持容忍态度。此外,我们发现,演讲的内容对S3Ms在下游任务中的性能影响不大,但S3Ms确实表现出倾向于更慢的言语率。

0

相关内容

【西湖大学】图预训练方法体系综述，A Survey of Pre-training on Graphs: Taxonomy, Methods and Applications

【西湖大学】图预训练方法体系综述，A Survey of Pre-training on Graphs: Taxonomy, Methods and Applications

专知会员服务

43+阅读 · 2022年3月25日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

肺炎链球菌疫苗SPY1的一种免疫保护机制：TGF-β信号通路介导Treg细胞参与保护性免疫

国家自然科学基金

0+阅读 · 2015年12月31日

稀疏支持向量机的理论、算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

压缩子空间聚类理论及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

地球物理数据的正则化与稀疏优化反演方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

NLS1介导的转录因子Arx在细胞核定位的分子机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

文本语义模型和子空间聚类研究

国家自然科学基金

1+阅读 · 2009年12月31日

Analyzing Gender Representation in Multilingual Models

Arxiv

0+阅读 · 2022年4月20日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

Arxiv

0+阅读 · 2022年4月19日

Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

Arxiv

0+阅读 · 2022年4月16日

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Arxiv

0+阅读 · 2022年4月15日

Vision-and-Language Pretrained Models: A Survey

Vision-and-Language Pretrained Models: A Survey

Arxiv

3+阅读 · 2022年4月15日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

An Attentive Survey of Attention Models

An Attentive Survey of Attention Models

Arxiv

44+阅读 · 2020年12月15日

VIP会员

文章信息

相关主题

有偏数据集

最新内容

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

6+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

2+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

2+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

13+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

12+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

12+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

8+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

13+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

9+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

22+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

11+阅读 · 6月17日

相关VIP内容

【西湖大学】图预训练方法体系综述，A Survey of Pre-training on Graphs: Taxonomy, Methods and Applications

【西湖大学】图预训练方法体系综述，A Survey of Pre-training on Graphs: Taxonomy, Methods and Applications

专知会员服务

43+阅读 · 2022年3月25日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

美国从乌克兰无人机战争中学习经验

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Analyzing Gender Representation in Multilingual Models

Arxiv

0+阅读 · 2022年4月20日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

Arxiv

0+阅读 · 2022年4月19日

Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

Arxiv

0+阅读 · 2022年4月16日

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Arxiv

0+阅读 · 2022年4月15日

Vision-and-Language Pretrained Models: A Survey

Vision-and-Language Pretrained Models: A Survey

Arxiv

3+阅读 · 2022年4月15日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

An Attentive Survey of Attention Models

An Attentive Survey of Attention Models

Arxiv

44+阅读 · 2020年12月15日

相关基金

肺炎链球菌疫苗SPY1的一种免疫保护机制：TGF-β信号通路介导Treg细胞参与保护性免疫

国家自然科学基金

0+阅读 · 2015年12月31日

稀疏支持向量机的理论、算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

压缩子空间聚类理论及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

地球物理数据的正则化与稀疏优化反演方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

NLS1介导的转录因子Arx在细胞核定位的分子机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

文本语义模型和子空间聚类研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员