Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multipitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-ofthe-art methods by up to 10.6%, 8.3% and 10.3% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study shows the effectiveness of each component of JEPOO.
翻译:旋律提取是音乐信息检索中的核心任务,而音高、起始点与终止点的估计则是旋律提取的关键子任务。现有方法的精度有限,且仅适用于单音高或多音高中单一数据类型。本文提出了一种名为JEPOO的高精度音高、起始点与终止点联合估计方法。我们通过新颖的模型设计及名为帕累托调制损失与损失权重正则化的优化技术,解决了联合学习优化及处理单音高与多音高数据的挑战。这是首个能够精准处理单音高、多音高乃至混合音乐数据的方法。在涵盖多种真实数据集上的全面实验研究表明,JEPOO在音高、起始点与终止点预测上的表现分别比现有最先进方法高出最高10.6%、8.3%及10.3%,且对各类数据类型与乐器具有鲁棒性。消融实验验证了JEPOO各组件的有效性。