AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.
翻译:AI赋能的音乐处理是一个涵盖数十种任务的多样化领域,从生成任务(如音色合成)到理解任务(如音乐分类)不一而足。对于开发者和爱好者而言,由于不同任务的音乐数据表征方式存在巨大差异,且各平台间模型适用性各有不同,掌握所有这些任务以满足其音乐处理需求极为困难。因此,亟需构建一个能组织整合这些任务的系统,帮助从业者自动分析需求并调用合适工具作为解决方案。受大语言模型(LLMs)在任务自动化领域最新成功的启发,我们开发了名为MusicAgent的系统,它集成了大量音乐相关工具与自主工作流,用以满足用户需求。具体而言,我们构建了:1)从Hugging Face、GitHub、Web API等多源收集工具的工具集;2)由大语言模型(如ChatGPT)驱动的自主工作流,用于组织这些工具,自动将用户请求分解为多个子任务并调用相应音乐工具。该系统的主要目标是将用户从AI音乐工具的复杂性中解放出来,使其能够专注于创意层面。通过赋予用户轻松组合工具的自由度,该系统提供了无缝且丰富的音乐体验。