Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. Our goal is to improve audio classification accuracy with limited labeled data. Traditional data augmentation techniques, which apply artificial transformations (e.g., adding random noise or masking segments), struggle to create data that captures the true diversity present in real-world audios. To address this shortcoming, we propose to augment the dataset with synthetic audio generated from text-to-audio (T2A) diffusion models. However, synthesizing effective augmentations is challenging because not only should the generated data be acoustically consistent with the underlying small-scale dataset, but they should also have sufficient compositional diversity. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization. This ensures that the acoustic characteristics of the generated data remain consistent with the small-scale dataset. To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models to (1) generate diverse and meaningful audio captions and (2) iteratively refine their quality. The generated captions are then used to prompt the aligned T2A model. We extensively evaluate Synthio on ten datasets and four simulated limited-data settings. Results indicate our method consistently outperforms all baselines by 0.1%-39% using a T2A model trained only on weakly-captioned AudioSet.

翻译：本文提出Synthio，一种利用合成数据增强小规模音频分类数据集的新方法。我们的目标是在有限标注数据条件下提升音频分类准确率。传统数据增强技术通过人工变换（如添加随机噪声或掩蔽片段）生成数据，难以捕捉真实音频中存在的多样性。为克服这一局限，我们提出使用文本到音频（T2A）扩散模型生成的合成音频进行数据集增强。然而，生成有效的增强数据面临双重挑战：生成数据不仅需在声学特性上与原始小规模数据集保持一致性，还应具备足够的组合多样性。针对第一项挑战，我们通过偏好优化使T2A模型的生成结果与小规模数据集对齐，确保生成数据的声学特征与原始数据集保持一致。针对第二项挑战，我们提出一种创新的描述生成技术，利用大语言模型的推理能力实现：（1）生成多样化且语义丰富的音频描述；（2）通过迭代优化提升描述质量。生成的描述随后用于引导对齐后的T2A模型。我们在十个数据集和四种模拟有限数据场景中对Synthio进行了全面评估。实验结果表明，使用仅在弱标注AudioSet上训练的T2A模型时，我们的方法始终优于所有基线模型，提升幅度达0.1%-39%。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日