Generating piano accompaniments in the symbolic music domain is a challenging task that requires producing a complete piece of piano music from given melody and chord constraints, such as those provided by a lead sheet. In this paper, we propose a discrete diffusion-based piano accompaniment generation model, D3PIA, leveraging local alignment between lead sheet and accompaniment in piano-roll representation. D3PIA incorporates Neighborhood Attention (NA) to both encode the lead sheet and condition it for predicting note states in the piano accompaniment. This design enhances local contextual modeling by efficiently attending to nearby melody and chord conditions. We evaluate our model using the POP909 dataset, a widely used benchmark for piano accompaniment generation. Objective evaluation results demonstrate that D3PIA preserves chord conditions more faithfully compared to continuous diffusion-based and Transformer-based baselines. Furthermore, a subjective listening test indicates that D3PIA generates more musically coherent accompaniments than the comparison models.
翻译:在符号音乐领域中生成钢琴伴奏是一项具有挑战性的任务,它需要根据给定的旋律与和弦约束(例如由主旋律谱所提供的)生成完整的钢琴音乐片段。本文提出了一种基于离散扩散的钢琴伴奏生成模型D3PIA,该模型利用了主旋律谱与伴奏在钢琴卷帘表示中的局部对齐关系。D3PIA引入了邻域注意力机制,既用于编码主旋律谱,也将其作为条件来预测钢琴伴奏中的音符状态。这一设计通过高效关注邻近的旋律与和弦条件,增强了局部上下文建模能力。我们在广泛用于钢琴伴奏生成基准测试的POP909数据集上评估了我们的模型。客观评估结果表明,与基于连续扩散和基于Transformer的基线模型相比,D3PIA能更忠实地保持和弦条件。此外,主观听感测试表明,D3PIA生成的伴奏在音乐连贯性上优于对比模型。