This work introduces the M6(GPT)3 composer system, capable of generating complete, multi-minute musical compositions with complex structures in any time signature, in the MIDI domain from input descriptions in natural language. The system utilizes an autoregressive transformer language model to map natural language prompts to composition parameters in JSON format. The defined structure includes time signature, scales, chord progressions, and valence-arousal values, from which accompaniment, melody, bass, motif, and percussion tracks are created. We propose a genetic algorithm for the generation of melodic elements. The algorithm incorporates mutations with musical significance and a fitness function based on normal distribution and predefined musical feature values. The values adaptively evolve, influenced by emotional parameters and distinct playing styles. The system for generating percussion in any time signature utilises probabilistic methods, including Markov chains. Through both human and objective evaluations, we demonstrate that our music generation approach outperforms baselines on specific, musically meaningful metrics, offering a viable alternative to purely neural network-based systems.
翻译:本研究介绍了M6(GPT)3作曲系统,该系统能够根据自然语言输入描述,在MIDI领域中生成具有复杂结构、任意拍号且长达数分钟的完整音乐作品。该系统采用自回归Transformer语言模型,将自然语言提示映射为JSON格式的作曲参数。所定义的结构包括拍号、音阶、和弦进行以及效价-唤醒度数值,并基于此生成伴奏、旋律、贝斯、动机乐句和打击乐音轨。我们提出了一种用于生成旋律元素的遗传算法,该算法融合了具有音乐意义的变异操作,以及基于正态分布和预定义音乐特征值的适应度函数。这些数值受情感参数与不同演奏风格的影响而自适应演化。针对任意拍号的打击乐生成系统采用了包括马尔可夫链在内的概率方法。通过人工评估与客观指标评估,我们证明了本音乐生成方法在特定且具有音乐意义的指标上优于基线系统,为纯神经网络系统提供了一种可行的替代方案。