Singing is one of the most cherished forms of human entertainment. However, creating a beautiful song requires an accompaniment that complements the vocals and aligns well with the song instruments and genre. With advancements in deep learning, previous research has focused on generating suitable accompaniments but often lacks precise alignment with the desired instrumentation and genre. To address this, we propose a straightforward method that enables control over the accompaniment through text prompts, allowing the generation of music that complements the vocals and aligns with the song instrumental and genre requirements. Through extensive experiments, we successfully generate 10-second accompaniments using vocal input and text control.
翻译:歌唱是人类最珍视的娱乐形式之一。然而,创作一首优美的歌曲需要能够衬托人声、并与歌曲乐器配置及风格相契合的伴奏。随着深度学习技术的发展,先前研究主要集中于生成合适的伴奏,但往往难以与期望的乐器配置和音乐风格实现精确匹配。为解决这一问题,我们提出一种通过文本提示控制伴奏的简明方法,该方法能够生成既衬托人声又符合歌曲器乐配置与风格要求的音乐。通过大量实验,我们成功利用人声输入与文本控制生成了10秒时长的伴奏片段。