Most work in AI music generation focused on audio, which has seen limited use in the music production industry due to its rigidity. To maximize flexibility while assuming only textual instructions from producers, we are among the first to tackle symbolic music editing. We circumvent the known challenge of lack of labeled data by proving that LLMs with zero-shot prompting can effectively edit drum grooves. The recipe of success is a creatively designed format that interfaces LLMs and music, while we facilitate evaluation by providing an evaluation dataset with annotated unit tests that highly aligns with musicians' judgment.
翻译:大多数人工智能音乐生成研究集中于音频领域,由于音频的刚性特性,其在音乐制作行业的应用受到限制。为了在仅假设制作人提供文本指令的前提下最大化灵活性,我们率先探索符号音乐编辑这一方向。通过证明采用零样本提示的大语言模型能够有效编辑鼓点律动,我们规避了标注数据匮乏这一已知挑战。成功的关键在于创造性设计了一种连接大语言模型与音乐的交互格式,同时我们通过提供带有标注单元测试的评估数据集来促进评估,该数据集与音乐家的判断高度契合。