The field of AI-assisted music creation has made significant strides, yet existing systems often struggle to meet the demands of iterative and nuanced music production. These challenges include providing sufficient control over the generated content and allowing for flexible, precise edits. This thesis tackles these issues by introducing a series of advancements that progressively build upon each other, enhancing the controllability and editability of text-to-music generation models. First, we introduce Loop Copilot, a system that tries to address the need for iterative refinement in music creation. Loop Copilot leverages a large language model (LLM) to coordinate multiple specialised AI models, enabling users to generate and refine music interactively through a conversational interface. Central to this system is the Global Attribute Table, which records and maintains key musical attributes throughout the iterative process, ensuring that modifications at any stage preserve the overall coherence of the music. While Loop Copilot excels in orchestrating the music creation process, it does not directly address the need for detailed edits to the generated content. To overcome this limitation, MusicMagus is presented as a further solution for editing AI-generated music. MusicMagus introduces a zero-shot text-to-music editing approach that allows for the modification of specific musical attributes, such as genre, mood, and instrumentation, without the need for retraining. By manipulating the latent space within pre-trained diffusion models, MusicMagus ensures that these edits are stylistically coherent and that non-targeted attributes remain unchanged. This system is particularly effective in maintaining the structural integrity of the music during edits, but it encounters challenges with more complex and real-world audio scenarios. ...
翻译:人工智能辅助音乐创作领域已取得显著进展,但现有系统往往难以满足迭代式、精细化音乐制作的需求。这些挑战包括对生成内容提供足够的控制能力,以及支持灵活、精确的编辑操作。本论文通过引入一系列递进式创新技术来解决这些问题,逐步增强文本到音乐生成模型的可控性与可编辑性。首先,我们提出Loop Copilot系统,旨在应对音乐创作中迭代优化的需求。该系统利用大语言模型(LLM)协调多个专用AI模型,使用户能够通过对话式界面交互式地生成与优化音乐。该系统的核心是全局属性表,它在整个迭代过程中记录并维护关键音乐属性,确保任何阶段的修改都能保持音乐的整体连贯性。尽管Loop Copilot在编排音乐创作流程方面表现出色,但并未直接解决对生成内容进行细节编辑的需求。为突破这一限制,我们进一步提出MusicMagus作为AI生成音乐的编辑解决方案。MusicMagus引入了一种零样本文本到音乐编辑方法,允许修改特定音乐属性(如流派、情绪和乐器配置),而无需重新训练模型。通过操控预训练扩散模型中的潜在空间,MusicMagus确保这些编辑在风格上保持连贯,且非目标属性保持不变。该系统在编辑过程中能有效维持音乐的结构完整性,但在处理更复杂、更接近真实场景的音频时仍面临挑战。...