In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amounts of recorded or transcribed music, which in turn also allows for fairly straight-forward "performance" evaluation. However, most of these models are not suitable for human-machine co-creation through live interaction, neither is clear, how such models and resulting creations would be evaluated. This article presents a thorough review of music representation, feature analysis, heuristic algorithms, statistical and parametric modelling, and human and automatic evaluation measures, along with a discussion of which approaches and models seem most suitable for live interaction.
翻译:近年来,机器学习,特别是生成对抗神经网络(GANs)和基于注意力机制的神经网络(Transformer),已成功用于创作和生成旋律及复调音乐作品。当前研究主要聚焦于基于大量录制或转录音乐的风格复制(如生成巴赫风格的众赞歌)或风格迁移(如古典乐转爵士乐),这类方法也使得"性能"评估相对直接。然而,大多数此类模型并不适用于通过实时互动实现人机协同创作,同时也不清楚此类模型及其创作成果应如何被评估。本文对音乐表征、特征分析、启发式算法、统计与参数化建模、以及人工与自动评估方法进行了全面综述,并探讨哪些方法与模型最适用于实时交互场景。