A Large-scale Dataset for Audio-Language Representation Learning

The AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1.9M audio-text pairs. To demonstrate the effectiveness of the proposed dataset, we train popular models on our dataset and show performance improvement on various downstream tasks, namely, audio-language retrieval, audio captioning, environment classification. In addition, we establish a novel test set and provide a benchmark for audio-text tasks. The proposed dataset will be released at https://auto-acd.github.io/.

翻译：人工智能社区在大规模多模态数据集的驱动下，在开发强大基础模型方面取得了显著进展。然而，在音频表征学习社区中，现有的音频-语言数据集存在数据量不足、内容简单、收集过程繁琐等局限性。为解决这些挑战，我们提出了一种创新的自动化音频字幕生成流程，该流程基于一系列公共工具或API，并构建了一个大规模、高质量的音频-语言数据集，命名为Auto-ACD，包含超过190万对音频-文本对。为验证所提出数据集的有效性，我们在此数据集上训练了主流模型，并在多个下游任务（包括音频-语言检索、音频字幕生成、环境分类）中展示了性能提升。此外，我们建立了一个新的测试集，并为音频-文本任务提供了基准测试。所提出的数据集将发布于https://auto-acd.github.io/。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日