We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and generative adversarial networks. We compare their ability to model polyphonic note sequences, learn useful latent representations, and generate stylistically coherent compositions. Our experiments show that the autoregressive LSTM with attention produces the most musically coherent samples, while vector quantization helps mitigate posterior collapse and yields more structured outputs than conventional recurrent VAEs. The adversarial approach captures local pitch patterns but remains difficult to train and generalizes less reliably to Bach's style. These results highlight the relative strengths and failure modes of autoregressive, latent-variable, and adversarial approaches for symbolic music generation.
翻译:我们基于共享的MIDI语料库和三种模型族,研究了巴赫风格符号钢琴音乐的生成建模:带注意力的自回归LSTM、包含循环VAE和向量量化VAE的潜变量模型,以及生成对抗网络。我们比较了它们在建模复调音符序列、学习有效潜在表示以及生成风格一致作品方面的能力。实验表明,带注意力的自回归LSTM能生成音乐连贯性最佳的样本;向量量化有助于缓解后验崩塌,且相比传统循环VAE能产生更具结构化的输出。对抗方法虽能捕捉局部音高模式,但训练困难且对巴赫风格的泛化可靠性不足。这些结果揭示了自回归、潜变量和对抗方法在符号音乐生成中的相对优势及失效模式。