AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.
翻译:AI驱动的音乐处理是一个涵盖数十种任务的多元化领域,从生成任务(如音色合成)到理解任务(如音乐分类)。对于开发者和爱好者而言,由于各类任务在音乐数据表征和跨平台模型适用性方面存在巨大差异,掌握所有这些任务以满足其音乐处理需求极具挑战性。因此,有必要构建一个系统来组织整合这些任务,帮助从业者自动分析需求并调用合适工具作为解决方案。受大语言模型(LLMs)在任务自动化领域近期成功的启发,我们开发了名为MusicAgent的系统,该系统集成了大量音乐相关工具并具备自主工作流,以响应用户需求。具体而言,我们构建了:1)工具集,从Hugging Face、GitHub和Web API等不同来源收集工具;2)由大语言模型(如ChatGPT)驱动的自主工作流,用于组织这些工具,自动分解用户请求为多个子任务并调用相应的音乐工具。本系统的核心目标是让用户从复杂的AI音乐工具中解放出来,使其能够专注于创作层面。通过赋予用户轻松组合工具的自由度,该系统提供了无缝且丰富的音乐体验。