Estimating the performance difficulty of a musical score is crucial in music education for adequately designing the learning curriculum of the students. Although the Music Information Retrieval community has recently shown interest in this task, existing approaches mainly use machine-readable scores, leaving the broader case of sheet music images unaddressed. Based on previous works involving sheet music images, we use a mid-level representation, bootleg score, describing notehead positions relative to staff lines coupled with a transformer model. This architecture is adapted to our task by introducing an encoding scheme that reduces the encoded sequence length to one-eighth of the original size. In terms of evaluation, we consider five datasets -- more than 7500 scores with up to 9 difficulty levels -- , two of them particularly compiled for this work. The results obtained when pretraining the scheme on the IMSLP corpus and fine-tuning it on the considered datasets prove the proposal's validity, achieving the best-performing model with a balanced accuracy of 40.34\% and a mean square error of 1.33. Finally, we provide access to our code, data, and models for transparency and reproducibility.
翻译:估算乐谱的演奏难度在音乐教育中对于合理设计学生的学习课程至关重要。尽管音乐信息检索领域最近对这一任务表现出兴趣,但现有方法主要使用机器可读的乐谱,而忽略了更广泛的乐谱图像情况。基于先前涉及乐谱图像的工作,我们使用一种中间层表示——"bootleg score"(描述符头相对于谱线的位置)并结合变压器模型。该架构通过引入一种编码方案(将编码序列长度压缩至原始长度的八分之一)适配于我们的任务。在评估方面,我们考虑了五个数据集(包含超过7500份乐谱,难度等级高达9级),其中两个是为本研究特别整理的。在IMSLP语料库上预训练并在所考虑数据集上微调后得到的结果证明了该方案的有效性,最佳模型实现了40.34%的平衡准确率和1.33的均方误差。最后,我们公开了代码、数据和模型以确保透明性和可复现性。