A Large-scale Dataset for Audio-Language Representation Learning

The AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1.9M audio-text pairs. To demonstrate the effectiveness of the proposed dataset, we train popular models on our dataset and show performance improvement on various downstream tasks, namely, audio-language retrieval, audio captioning, environment classification. In addition, we establish a novel test set and provide a benchmark for audio-text tasks. The proposed dataset will be released at https://auto-acd.github.io/.

翻译：人工智能社区在开发强大基础模型方面取得了显著进展，这得益于大规模多模态数据集的推动。然而，在音频表征学习领域，当前现有的音频-语言数据集存在数据量不足、内容单一、采集流程繁琐等局限性。为解决上述挑战，我们提出了一种基于系列公共工具或API的创新性自动音频描述生成流程，并构建了名为Auto-ACD的大规模高质量音频-语言数据集，包含超过190万条音频-文本对。为验证所提数据集的有效性，我们在该数据集上训练主流模型，并在音频-语言检索、音频描述生成、环境分类等多个下游任务中展示了性能提升。此外，我们构建了全新的测试集，并为音频-文本任务提供了基准测评。所提数据集将在https://auto-acd.github.io/开放获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日