Computational harmony analysis is important for MIR tasks such as automatic segmentation, corpus analysis and automatic chord label estimation. However, recent research into the ambiguous nature of musical harmony, causing limited inter-rater agreement, has made apparent that there is a glass ceiling for common metrics such as accuracy. Commonly, these issues are addressed either in the training data itself by creating majority-rule annotations or during the training phase by learning soft targets. We propose a novel alternative approach in which a human and an autoregressive model together co-create a harmonic annotation for an audio track. After automatically generating harmony predictions, a human sparsely annotates parts with low model confidence and the model then adjusts its predictions following human guidance. We evaluate our model on a dataset of popular music and we show that, with this human-in-the-loop approach, harmonic analysis performance improves over a model-only approach. The human contribution is amplified by the second, constrained prediction of the model.
翻译:计算和声分析对于自动分割、语料库分析和自动和弦标签估计等MIR任务至关重要。然而,近期关于音乐和声模糊性的研究表明,这种模糊性导致评分者间一致性有限,使得准确率等常见指标存在天花板效应。通常,这些问题要么通过创建多数规则标注在训练数据中解决,要么在训练阶段通过学习软目标来处理。我们提出了一种新颖的替代方法:让人类与自回归模型共同为音频轨迹创建和声标注。在自动生成和声预测后,人类对模型置信度较低的部分进行稀疏标注,模型随后根据人类引导调整其预测。我们在流行音乐数据集上评估了该模型,结果表明,采用这种人在环中方法,和声分析性能优于纯模型方法。人类贡献通过模型的二次约束预测得到了放大。