In recent years, thanks to advances in automatic music transcription (AMT), several large-scale datasets of automatically transcribed piano solo music have been released. While these datasets undoubtedly offer extensive material for performance studies, they vary substantially in quality. In the case of classical music, performances often differ not only in expressive aspects such as tempo, but also in their structural interpretation of the score (including repeat patterns and edition-specific variants). To meaningfully use large-scale transcribed datasets for performance research, transcriptions of the same piece must be grouped according to their underlying structural realisation to support valid comparison. We address this by applying sequence-to-sequence alignment followed by hierarchical clustering: we create pairwise alignments for all pairs of transcriptions of a given piece, and use the alignment cost and (dis)similarity of performed sequence lengths to resolve structural mismatches as features for grouping. We propose this approach as a first step towards automatically evaluating large-scale transcribed datasets that lack ground-truth score and/or audio, shifting the evaluation criterion from truth-based accuracy to musical coherence and plausibility. We demonstrate our score-agnostic approach on around 1,500 transcriptions of 88 compositions from a recently published large-scale transcribed piano performance dataset.
翻译:近年来,得益于自动音乐转录(AMT)技术的进步,多个大规模自动转录钢琴独奏音乐的数据集已被发布。尽管这些数据集无疑为表演研究提供了丰富的素材,但其质量参差不齐。就古典音乐而言,表演不仅在意速度等表现层面存在差异,在对乐谱的结构性诠释(包括重复模式及版本特有的变体)上也各有不同。为了将大规模转录数据集有效用于表演研究,必须根据其底层的结构实现方式对同一乐曲的转录结果进行分组,以支持有效的比较。我们通过采用序列到序列对齐后接层次聚类的方法解决这一问题:对给定乐曲的所有转录结果进行两两对齐,并利用对齐代价以及表演序列长度的(不)相似性作为分组特征,以解析结构性错配。我们提出此方法作为自动评估缺乏真实乐谱和/或音频的大规模转录数据集的第一步,将评估标准从基于事实的准确性转向音乐连贯性与合理性。我们在此方法无需依赖乐谱的前提下,基于最近发布的一个大规模转录钢琴表演数据集中的88首作曲的约1500份转录结果进行了验证。