Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.
翻译:采样技术,即通过复用现有音频片段来创作新音乐内容,是现代音乐制作中极为常见的实践。本文致力于解决自动采样识别这一具有挑战性的任务,即检测此类采样内容并追溯其原始素材。为此,我们采用自监督学习方法,利用多轨数据集构建人工混音的正样本对,并设计了一种新颖的对比学习目标。实验表明,该方法显著超越了现有最优基线模型,对不同音乐流派具有鲁棒性,且在参考数据库中噪声曲目数量增加时仍能保持良好的扩展性。此外,我们深入分析了训练流程中各组件的贡献,特别强调了该任务对高质量分离音轨的需求。