Symbolic music representation is a fundamental challenge in computational musicology. While grid-based representations effectively preserve pitch-time spatial correspondence, their inherent data sparsity leads to low encoding efficiency. Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns. Specifically, we design four complementary event types: Frame Events for temporal boundaries, Gap Events for sparse regions, Pattern Events for note patterns, and Musical Structure Events for musical metadata. Pianoroll-Event strikes an effective balance between sequence length and vocabulary size, improving encoding efficiency by 1.36\times to 7.16\times over representative discrete sequence methods. Experiments across multiple autoregressive architectures show models using our representation consistently outperform baselines in both quantitative and human evaluations.
翻译:符号音乐表示是计算音乐学中的一个基本挑战。基于网格的表示法能有效保留音高-时间的空间对应关系,但其固有的数据稀疏性导致编码效率低下。离散事件表示法实现了紧凑编码,但未能充分捕捉结构不变性与空间局部性。为应对这些互补的局限性,我们提出Pianoroll-Event,一种新颖的编码方案,通过事件来描述钢琴卷帘表示,将结构特性与编码效率相结合,同时保持时间依赖性与局部空间模式。具体而言,我们设计了四种互补的事件类型:用于时间边界的帧事件、用于稀疏区域的间隙事件、用于音符模式的模式事件以及用于音乐元数据的音乐结构事件。Pianoroll-Event在序列长度与词汇量大小之间实现了有效平衡,相较于代表性的离散序列方法,编码效率提升了1.36倍至7.16倍。在多种自回归架构上的实验表明,使用我们表示法的模型在定量评估与人工评估中均持续优于基线方法。