Automatic chord recognition (ACR) via deep learning models has gradually achieved promising recognition accuracy, yet two key challenges remain. First, prior work has primarily focused on audio-domain ACR, while symbolic music (e.g., score) ACR has received limited attention due to data scarcity. Second, existing methods still overlook strategies that are aligned with human music analytical practices. To address these challenges, we make two contributions: (1) we introduce POP909-CL, an enhanced version of POP909 dataset with tempo-aligned content and human-corrected labels of chords, beats, keys, and time signatures; and (2) We propose BACHI, a symbolic chord recognition model that decomposes the task into different decision steps, namely boundary detection and iterative ranking of chord root, quality, and bass (inversion). This mechanism mirrors the human ear-training practices. Experiments demonstrate that BACHI achieves state-of-the-art chord recognition performance on both classical and pop music benchmarks, with ablation studies validating the effectiveness of each module.
翻译:通过深度学习模型实现自动和弦识别(ACR)已逐步取得良好的识别准确率,但仍存在两个关键挑战。首先,先前研究主要集中于音频领域的ACR,而由于数据稀缺,符号音乐(如乐谱)的ACR研究关注有限。其次,现有方法仍缺乏与人类音乐分析实践相契合的策略。为应对这些挑战,本研究提出两项贡献:(1)我们构建了POP909-CL数据集——该数据集为POP909的增强版本,包含节拍对齐内容及人工校正的和弦、拍子、调性与拍号标签;(2)我们提出BACHI模型,该符号和弦识别模型将任务分解为边界检测以及和弦根音、性质与低音(转位)的迭代排序等多个决策步骤,这一机制模拟了人类听觉训练的实际过程。实验表明,BACHI在古典音乐与流行音乐基准测试中均达到最先进的和弦识别性能,消融研究验证了各模块的有效性。