Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphasis, rhyme, and high-level structure. To address this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete revision of the transcripts, geared specifically towards ALT evaluation by following a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds. Secondly, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.
翻译:当前自动歌词转录基准测试仅关注单词内容,忽略了书面歌词中包括格式和标点符号在内的细微差别,导致与音乐家和词曲作者的创意产品以及听众体验之间存在潜在错位。例如,换行在传达节奏、情感强调、韵律和高级结构信息方面至关重要。为解决此问题,我们基于JamendoLyrics数据集提出新的歌词转录基准Jam-ALT。我们的贡献体现在两方面:首先,对转录文本进行全面修订,专门针对ALT评估需求,遵循新创建的标注指南(该指南统一了音乐行业规范),涵盖标点、换行、拼写、背景人声及非单词声音等要素;其次,设计了一套评估指标,与传统词错误率不同,该指标旨在捕捉上述现象。我们希望所提出的基准能推动ALT任务发展,实现更精确可靠的转录系统评估,并增强歌词应用(如实时字幕或卡拉OK的字幕渲染)中的用户体验。