Lyrics generation is a well-known application in natural language generation research, with several previous studies focusing on generating accurate lyrics using precise control such as keywords, rhymes, etc. However, lyrics imitation, which involves writing new lyrics by imitating the style and content of the source lyrics, remains a challenging task due to the lack of a parallel corpus. In this paper, we introduce \textbf{\textit{Sudowoodo}}, a Chinese lyrics imitation system that can generate new lyrics based on the text of source lyrics. To address the issue of lacking a parallel training corpus for lyrics imitation, we propose a novel framework to construct a parallel corpus based on a keyword-based lyrics model from source lyrics. Then the pairs \textit{(new lyrics, source lyrics)} are used to train the lyrics imitation model. During the inference process, we utilize a post-processing module to filter and rank the generated lyrics, selecting the highest-quality ones. We incorporated audio information and aligned the lyrics with the audio to form the songs as a bonus. The human evaluation results show that our framework can perform better lyric imitation. Meanwhile, the \textit{Sudowoodo} system and demo video of the system is available at \href{https://Sudowoodo.apps-hp.danlu.netease.com/}{Sudowoodo} and \href{https://youtu.be/u5BBT_j1L5M}{https://youtu.be/u5BBT\_j1L5M}.
翻译:歌词生成是自然语言生成研究中一个广为人知的应用,此前多项研究聚焦于通过关键词、押韵等精确控制手段生成准确的歌词。然而,歌词模仿(即通过模仿源歌词的风格与内容创作新歌词)由于缺乏平行语料库,仍是一项具有挑战性的任务。本文介绍了\textbf{\textit{Sudowoodo}},一个中文歌词模仿系统,该系统可根据源歌词的文本生成新歌词。为解决歌词模仿缺乏平行训练语料库的问题,我们提出了一种新颖的框架,基于源歌词中关键词驱动的歌词模型构建平行语料库。随后,利用(新歌词, 源歌词)配对训练歌词模仿模型。在推理过程中,我们采用后处理模块对所生成的歌词进行筛选与排序,选取质量最高的结果。作为额外输出,我们融入了音频信息,并将歌词与音频对齐以形成完整的歌曲。人工评估结果表明,我们的框架能实现更优的歌词模仿。此外,\textit{Sudowoodo}系统及演示视频可在\href{https://Sudowoodo.apps-hp.danlu.netease.com/}{Sudowoodo}和\href{https://youtu.be/u5BBT_j1L5M}{https://youtu.be/u5BBT\_j1L5M}获取。