LLM2Fx-Tools：面向音乐后期制作的工具调用框架 (LLM2Fx-Tools: Tool Calling For Music Post-Production) - 专知论文

会员服务 ·

0

工具 · 音乐 · CoT · TOOLS · 大语言模型 ·

2025 年 12 月 1 日

LLM2Fx-Tools: Tool Calling For Music Post-Production

翻译：LLM2Fx-Tools：面向音乐后期制作的工具调用框架

Seungheon Doh,Junghyun Koo,Marco A. Martínez-Ramírez,Woosung Choi,Wei-Hsiang Liao,Qiyu Wu,Juhan Nam,Yuki Mitsufuji

This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine their order, and estimate parameters, guided by chain-of-thought (CoT) planning. We also present LP-Fx, a new instruction-following dataset with structured CoT annotations and tool calls for audio effects modules. Experiments show that LLM2Fx-Tools can infer an Fx-chain and its parameters from pairs of unprocessed and processed audio, enabled by autoregressive sequence modeling, tool calling, and CoT reasoning. We further validate the system in a style transfer setting, where audio effects information is transferred from a reference source and applied to new content. Finally, LLM-as-a-judge evaluation demonstrates that our approach generates appropriate CoT reasoning and responses for music production queries. To our knowledge, this is the first work to apply LLM-based tool calling to audio effects modules, enabling interpretable and controllable music production.

翻译：本文介绍了LLM2Fx-Tools，一种多模态工具调用框架，用于为音乐后期制作生成可执行的音频效果序列（效果链）。该框架利用大语言模型（LLM）理解音频输入，在思维链（CoT）规划的引导下选择音频效果类型、确定其顺序并估算参数。我们还提出了LP-Fx，一个包含结构化CoT标注和音频效果模块工具调用的新型指令跟随数据集。实验表明，通过自回归序列建模、工具调用和CoT推理，LLM2Fx-Tools能够从未处理与已处理的音频对中推断出效果链及其参数。我们进一步在风格迁移场景中验证了该系统，即将音频效果信息从参考源迁移并应用于新内容。最后，基于LLM的评估表明，我们的方法能够为音乐制作查询生成恰当的CoT推理与响应。据我们所知，这是首次将基于LLM的工具调用应用于音频效果模块，实现了可解释且可控的音乐制作。

0

相关内容

【AAAI2026】Align3GR：面向 LLM 生成式推荐的统一多层次对齐方法

【AAAI2026】Align3GR：面向 LLM 生成式推荐的统一多层次对齐方法

专知会员服务

13+阅读 · 2025年11月17日

【NeurIPS2024】CA-SSLR：面向广义语音处理的条件感知自监督学习表征

【NeurIPS2024】CA-SSLR：面向广义语音处理的条件感知自监督学习表征

专知会员服务

15+阅读 · 2024年12月6日

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

专知会员服务

13+阅读 · 2024年10月16日

【CVPR2024】ViewDiff: 3D一致的图像生成与文本到图像模型

【CVPR2024】ViewDiff: 3D一致的图像生成与文本到图像模型

专知会员服务

30+阅读 · 2024年3月10日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知

21+阅读 · 2021年10月25日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

将Python用于NLP：Pattern 库简介

将Python用于NLP：Pattern 库简介

Python程序员

15+阅读 · 2019年6月7日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

产业智能官

18+阅读 · 2018年7月26日

Facebook开源MUSE：多语言无监督和监督词向量库

Facebook开源MUSE：多语言无监督和监督词向量库

论智

20+阅读 · 2017年12月23日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

9+阅读 · 2015年12月31日

自由视点三维视频中纹理-深度图像联合建模及应用

国家自然科学基金

0+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

一类极大加和逆优化问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

175+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

【AAAI2026】Align3GR：面向 LLM 生成式推荐的统一多层次对齐方法

【AAAI2026】Align3GR：面向 LLM 生成式推荐的统一多层次对齐方法

专知会员服务

13+阅读 · 2025年11月17日

【NeurIPS2024】CA-SSLR：面向广义语音处理的条件感知自监督学习表征

【NeurIPS2024】CA-SSLR：面向广义语音处理的条件感知自监督学习表征

专知会员服务

15+阅读 · 2024年12月6日

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

专知会员服务

13+阅读 · 2024年10月16日

【CVPR2024】ViewDiff: 3D一致的图像生成与文本到图像模型

【CVPR2024】ViewDiff: 3D一致的图像生成与文本到图像模型

专知会员服务

30+阅读 · 2024年3月10日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基于自适应表征的高效视觉建模

《多域作战中融合网络、电子战与动能机动》

AI智能体时代大模型安全风险与攻防新挑战

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

相关资讯

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知

21+阅读 · 2021年10月25日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

将Python用于NLP：Pattern 库简介

将Python用于NLP：Pattern 库简介

Python程序员

15+阅读 · 2019年6月7日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

【语义分割】一文概览主要语义分割网络：FCN,SegNet,U-Net...

产业智能官

18+阅读 · 2018年7月26日

Facebook开源MUSE：多语言无监督和监督词向量库

Facebook开源MUSE：多语言无监督和监督词向量库

论智

20+阅读 · 2017年12月23日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

175+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

相关基金

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

9+阅读 · 2015年12月31日

自由视点三维视频中纹理-深度图像联合建模及应用

国家自然科学基金

0+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

一类极大加和逆优化问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员