Piano cover generation aims to automatically transform a pop song into a piano arrangement. While numerous deep learning approaches have been proposed, existing models often fail to maintain structural consistency with the original song, likely due to the absence of beat-aware mechanisms or the difficulty of modeling complex rhythmic patterns. Rhythmic information is crucial, as it defines structural similarity (e.g., tempo, BPM) and directly impacts the overall quality of the generated music. In this paper, we introduce Etude, a three-stage architecture consisting of Extract, strucTUralize, and DEcode stages. By pre-extracting rhythmic information and applying a novel, simplified REMI-based tokenization, our model produces covers that preserve proper song structure, enhance fluency and musical dynamics, and support highly controllable generation through style injection. Subjective evaluations with human listeners show that Etude substantially outperforms prior models, achieving a quality level comparable to that of human composers.
翻译:钢琴翻奏生成旨在将流行歌曲自动转化为钢琴编排。尽管已有大量深度学习方法被提出,但现有模型往往难以保持与原始歌曲的结构一致性,这可能是由于缺乏节拍感知机制或难以建模复杂节奏模式所致。节奏信息至关重要,它定义了结构相似性(如速度、BPM)并直接影响生成音乐的整体质量。本文提出Etude——一个由提取、结构化和解码三个阶段组成的架构。通过预先提取节奏信息并应用新颖的简化REMI标记化方法,我们的模型能够生成保持正确歌曲结构、提升流畅性与音乐动态性,并通过风格注入支持高度可控生成的翻奏作品。人工听众的主观评估表明,Etude显著优于现有模型,其生成质量已达到与人类作曲家相当的水平。