YouTube is a rich source of cover songs. Since the platform itself is organized in terms of videos rather than songs, the retrieval of covers is not trivial. The field of cover song identification addresses this problem and provides approaches that usually rely on audio content. However, including the user-generated video metadata available on YouTube promises improved identification results. In this paper, we propose a multi-modal approach for cover song identification on online video platforms. We combine the entity resolution models with audio-based approaches using a ranking model. Our findings implicate that leveraging user-generated metadata can stabilize cover song identification performance on YouTube.
翻译:YouTube是翻唱歌曲的丰富来源。由于该平台本身以视频而非歌曲为单位进行组织,翻唱作品的检索并非易事。翻唱歌曲识别领域致力于解决这一问题,通常提供基于音频内容的方法。然而,整合YouTube上可用的用户生成视频元数据有望提升识别效果。本文提出一种面向在线视频平台的多模态翻唱歌曲识别方法。我们通过排序模型将实体解析模型与基于音频的方法相结合。研究结果表明,利用用户生成元数据能够提升YouTube平台翻唱歌曲识别性能的稳定性。