Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling

Long-context modeling is essential for symbolic music generation, since motif repetition and developmental variation can span thousands of musical events. However, practical composition and performance workflows frequently rely on resource-limited devices (e.g., electronic instruments and portable computers), making heavy memory and attention computation difficult to deploy. We introduce Depth-Structured Music Recurrence (DSMR), a recurrent long-context Transformer for full-piece symbolic music modeling that extends context beyond fixed-length excerpts via segment-level recurrence with detached cross-segment states, featuring a layer-wise memory-horizon schedule that budgets recurrent KV states across depth. DSMR is trained in a single left-to-right pass over each complete composition, akin to how a musician experiences it from beginning to end, while carrying recurrent cross-segment states forward. Within this recurrent framework, we systematically study how depth-wise horizon allocations affect optimization, best-checkpoint perplexity, and efficiency. By allocating different history-window lengths across layers while keeping the total recurrent-state budget fixed, DSMR creates depth-dependent temporal receptive fields within a recurrent attention stack without reducing compute depth. Our main instantiation is a two-scale DSMR schedule that allocates long history windows to lower layers and a uniform short window to the remaining layers. Experiments on the piano performance dataset MAESTRO demonstrate that two-scale DSMR provides a practical quality--efficiency recipe for full-length long-context symbolic music modeling with recurrent attention under limited computational resources.

翻译：长上下文建模对于符号音乐生成至关重要，因为动机重复与发展性变奏可能跨越数千个音乐事件。然而，实际作曲与演奏工作流常依赖于资源受限设备（如电子乐器和便携式计算机），使得重型内存与注意力计算难以部署。我们提出深度结构化音乐循环（DSMR），这是一种面向全曲符号音乐建模的循环长上下文Transformer，通过具有解耦跨片段状态的片段级循环机制将上下文扩展至固定长度片段之外，其核心特征是采用层间记忆时域调度策略，在深度维度上对循环键值状态进行预算分配。DSMR通过单次从左到右遍历完整乐曲进行训练，类似于音乐家从始至终体验作品的过程，同时持续传递跨片段循环状态。在此循环框架内，我们系统研究了深度维度时域分配如何影响优化过程、最佳检查点困惑度及计算效率。通过在保持总循环状态预算不变的前提下，为不同层分配差异化的历史窗口长度，DSMR在循环注意力堆栈中构建了深度依赖的时间感受野，且不降低计算深度。我们的主要实现方案是采用双尺度DSMR调度策略：为底层分配长历史窗口，其余层则采用统一的短窗口。在钢琴演奏数据集MAESTRO上的实验表明，双尺度DSMR为有限计算资源下采用循环注意力的全长符号音乐长上下文建模提供了实用化的质量-效率解决方案。