Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification

In the past few years, text-to-audio models have emerged as a significant advancement in automatic audio gener- ation. Although they represent impressive technological progress, the effectiveness of their use in the development of audio applications remains uncertain. This paper aims to investigate these aspects, specifically focusing on the task of classification of environmental sounds. This study analyzes the performance of two different environmental classification systems when data generated from text-to-audio models is used for training. Two cases are considered: a) when the training dataset is augmented by data coming from two different text-to-audio models; and b) when the training dataset consists solely of synthetic audio generated. In both cases, the performance of the classification task is tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, whereas the performance of the models drops when relying on only generated audio.

翻译：过去几年中，文本到音频模型作为自动音频生成领域的一项重大进步而出现。尽管它们代表了令人印象深刻的技术进展，但其在音频应用开发中的有效性仍不确定。本文旨在探究这些方面，特别聚焦于环境声音分类任务。本研究分析了使用文本到音频模型生成的数据进行训练时，两种不同环境分类系统的性能表现。研究考虑了两种情况：a) 训练数据集通过来自两种不同文本到音频模型的数据进行增强；b) 训练数据集仅由生成的合成音频组成。在两种情况下，分类任务的性能均在真实数据上进行了测试。结果表明，文本到音频模型在数据集增强方面是有效的，而当仅依赖生成音频时，模型性能会下降。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日