Music plagiarism detection is gaining more and more attention due to the popularity of music production and society's emphasis on intellectual property. We aim to find fine-grained plagiarism in music pairs since conventional methods are coarse-grained and cannot match real-life scenarios. Considering that there is no sizeable dataset designed for the music plagiarism task, we establish a large-scale simulated dataset, named Music Plagiarism Detection Dataset (MPD-Set) under the guidance and expertise of renowned researchers from national-level professional institutions in the field of music. MPD-Set considers diverse music plagiarism cases found in real life from the melodic, rhythmic, and tonal levels respectively. Further, we establish a Real-life Dataset for evaluation, where all plagiarism pairs are real cases. To detect the fine-grained plagiarism pairs effectively, we propose a graph-based method called Bipatite Melody Matching Detector (BMM-Det), which formulates the problem as a max matching problem in the bipartite graph. Experimental results on both the simulated and Real-life Datasets demonstrate that BMM-Det outperforms the existing plagiarism detection methods, and is robust to common plagiarism cases like transpositions, pitch shifts, duration variance, and melody change. Datasets and source code are open-sourced at https://github.com/xuan301/BMMDet_MPDSet.
翻译:音乐剽窃检测因音乐制作的普及及社会对知识产权的重视而日益受到关注。现有方法多为粗粒度检测,难以匹配真实场景,因此我们致力于发现音乐对中的细粒度剽窃行为。针对当前缺乏适用于音乐剽窃任务的大规模数据集问题,我们在国家级权威音乐机构知名研究者的专业指导下,构建了名为音乐剽窃检测数据集(MPD-Set)的大规模模拟数据集。该数据集从旋律、节奏、调性三个维度分别模拟现实中的多种音乐剽窃案例。此外,我们建立了一个真实案例数据集用于评估,其中所有剽窃对均来自实际案例。为有效检测细粒度剽窃对,我们提出基于图的方法——二分图旋律匹配检测器(BMM-Det),将问题建模为二分图中的最大匹配问题。在模拟数据集和真实数据集上的实验结果表明,BMM-Det的性能优于现有剽窃检测方法,并对转调、音高偏移、时值变化及旋律变体等常见剽窃情形具有鲁棒性。数据集与源代码已开源至 https://github.com/xuan301/BMMDet_MPDSet。