NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

Large language models (LLMs) prompted with text and audio represent the state of the art in various auditory tasks, including speech, music, and general audio, showing emergent abilities on unseen tasks. However, these capabilities have yet to be fully demonstrated in bioacoustics tasks, such as detecting animal vocalizations in large recordings, classifying rare and endangered species, and labeling context and behavior - tasks that are crucial for conservation, biodiversity monitoring, and the study of animal behavior. In this work, we present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. Our carefully curated training dataset comprises text-audio pairs spanning a diverse range of bioacoustics, speech, and music data, designed to address the challenges posed by limited annotated datasets in the field. We demonstrate successful transfer of learned representations from music and speech to bioacoustics, and our model shows promising generalization to unseen taxa and tasks. Importantly, we test NatureLM-audio on a novel benchmark (BEANS-Zero) and it sets the new state of the art (SotA) on several bioacoustics tasks, including zero-shot classification of unseen species. To advance bioacoustics research, we also open-source the code for generating training and benchmark data, as well as for training the model.

翻译：大型语言模型（LLMs）在结合文本与音频提示后，已在语音、音乐及通用音频处理等多种听觉任务中展现出尖端性能，并在未见任务上表现出涌现能力。然而，这些能力尚未在生物声学任务中得到充分验证，例如大规模录音中的动物发声检测、稀有及濒危物种分类、以及行为与情境标注——这些任务对于生态保护、生物多样性监测及动物行为研究至关重要。本研究提出NatureLM-audio，这是首个专为生物声学设计的音频-语言基础模型。我们精心构建的训练数据集包含涵盖生物声学、语音和音乐领域的多样化文本-音频对，旨在应对该领域标注数据有限的挑战。我们证明了模型成功实现了从音乐和语音到生物声学的表征迁移，并在未见类群和任务上展现出良好的泛化能力。值得注意的是，我们在新型基准测试（BEANS-Zero）上评估NatureLM-audio，其在多项生物声学任务中创造了新的性能标杆（SotA），包括对未见物种的零样本分类。为推进生物声学研究，我们同时开源了用于生成训练与基准数据以及训练模型的代码。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日