A Query-By-Humming (QBH) system constitutes a particular case of music information retrieval where the input is a user-hummed melody and the output is the original song which contains that melody. A typical QBH system consists of melody extraction and candidate melody retrieval. For melody extraction, accurate note transcription is the key enabling technology. However, current transcription methods are unable to definitively capture the melody and address inaccuracies in user-hummed queries. In this paper, we incorporate Total Variation Regularization (TVR) to denoise queries. This approach accounts for user error in humming without loss of meaningful data and reliably captures the underlying melody. For candidate melody retrieval, we employ a deep learning approach to time series classification using a Fully Convolutional Neural Network. The trained network classifies the incoming query as belonging to one of the target songs. For our experiments, we use Roger Jang's MIR-QBSH dataset which is the standard MIREX dataset. We demonstrate that inclusion of TVR denoised queries in the training set enhances the overall accuracy of the system to 93% which is higher than other state-of-the-art QBH systems.
翻译:哼唱检索(QBH)系统是音乐信息检索的一个特例,其输入是用户哼唱的旋律,输出是包含该旋律的原始歌曲。典型的QBH系统由旋律提取和候选旋律检索两部分组成。在旋律提取中,精确的乐音转录是关键使能技术。然而,当前的转录方法无法确定性地捕捉旋律,也难以应对用户哼唱查询中的不准确性。本文引入全变差正则化(TVR)对查询信号进行去噪处理。该方法能在不丢失有效数据的前提下处理用户哼唱误差,并可靠地捕捉底层旋律。在候选旋律检索环节,我们采用基于全卷积神经网络的深度学习时间序列分类方法。训练后的网络将输入查询分类到目标歌曲之一。实验采用Roger Jang的MIR-QBSH数据集(标准MIREX数据集)。结果表明,在训练集中加入TVR去噪查询后,系统整体准确率提升至93%,优于其他现有哼唱检索系统。