Audio-to-score alignment is a long-standing challenge in music information retrieval and arguably the most widely applicable alignment task for music research. Alignment algorithms match two versions of a piece of music, and for this to work these versions need to be in comparable formats. Audio-to-audio alignment matches audio features; when matching audio files to scores, they must either synthesize the score or derive audio-like features by means of piano rolls or similar feature sequences. Symbolic alignment, by contrast, matches symbolically encoded notes; in an audio-to-score scenario these would be obtained by a transcription of the audio file. In this article, we present an algorithm that bridges audio-like and symbol-level features directly. Sequential audio features encoding onset and spectral activation are matched to score positions by a bespoke dynamic programming-based matching algorithm derived from symbolic alignment methods. The resulting method is both precise - surpassing widely used audio-to-audio approaches based on synthesized scores -, and remains flexible in its digital signal processing components, i.e., the method is adaptable to diverse timbral characteristics without requiring a separate transcription model. Furthermore it inherits some of the symbolic alignment runtime advantages with an algorithmic complexity that is at worst linear in the length of the (typically short) symbolic score and (typically long) audio feature sequence. In the following sections, we provide a detailed algorithm description and evaluate its alignment quality on a large-scale dataset of solo piano recordings.
翻译:音频-乐谱对齐是音乐信息检索领域长期存在的挑战,也是音乐研究中应用最广泛的对齐任务。对齐算法能够匹配同一首音乐的两个版本,且要求这些版本具备可比格式。音频-音频对齐可匹配音频特征;当将音频文件与乐谱对齐时,要么需要合成乐谱,要么通过钢琴卷帘或类似特征序列提取类音频特征。符号对齐则匹配符号编码的音符;在音频-乐谱场景中,这些音符通过音频文件的转录获得。本文提出一种直接桥接类音频特征与符号级特征的算法。该算法利用基于符号对齐方法定制的动态规划匹配算法,将编码起音和频谱激活的序列音频特征与乐谱位置进行匹配。该方法兼具精确性(优于基于合成乐谱的广泛使用的音频-音频方法)和灵活性(其数字信号处理组件可适应不同音色特征,无需独立转录模型)。此外,该方法继承了符号对齐的运行时优势,其算法复杂度在最坏情况下与(通常较短的)符号乐谱和(通常较长的)音频特征序列长度呈线性关系。后续章节将提供详细的算法描述,并在大规模独奏钢琴录音数据集上评估其对齐质量。