MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.

翻译：AI赋能的音乐处理是一个涵盖数十种任务的多样化领域，从生成任务（如音色合成）到理解任务（如音乐分类）不一而足。对于开发者和爱好者而言，由于不同任务的音乐数据表征方式存在巨大差异，且各平台间模型适用性各有不同，掌握所有这些任务以满足其音乐处理需求极为困难。因此，亟需构建一个能组织整合这些任务的系统，帮助从业者自动分析需求并调用合适工具作为解决方案。受大语言模型（LLMs）在任务自动化领域最新成功的启发，我们开发了名为MusicAgent的系统，它集成了大量音乐相关工具与自主工作流，用以满足用户需求。具体而言，我们构建了：1）从Hugging Face、GitHub、Web API等多源收集工具的工具集；2）由大语言模型（如ChatGPT）驱动的自主工作流，用于组织这些工具，自动将用户请求分解为多个子任务并调用相应音乐工具。该系统的主要目标是将用户从AI音乐工具的复杂性中解放出来，使其能够专注于创意层面。通过赋予用户轻松组合工具的自由度，该系统提供了无缝且丰富的音乐体验。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日