While existing Singing Voice Synthesis systems achieve high-fidelity solo performances, they are constrained by global timbre control, failing to address dynamic multi-singer arrangement and vocal texture within a single song. To address this, we propose Tutti, a unified framework designed for structured multi-singer generation. Specifically, we introduce a Structure-Aware Singer Prompt to enable flexible singer scheduling evolving with musical structure, and propose Complementary Texture Learning via Condition-Guided VAE to capture implicit acoustic textures (e.g., spatial reverberation and spectral fusion) that are complementary to explicit controls. Experiments demonstrate that Tutti excels in precise multi-singer scheduling and significantly enhances the acoustic realism of choral generation, offering a novel paradigm for complex multi-singer arrangement. Audio samples are available at https://annoauth123-ctrl.github.io/Tutii_Demo/.
翻译:尽管现有的歌声合成系统已能实现高保真的独唱表演,但其受限于全局音色控制,无法处理单曲中动态的多歌手编排与声乐纹理。为此,我们提出Tutti——一个为结构化多歌手生成设计的统一框架。具体而言,我们引入了结构感知歌手提示,以实现随音乐结构演变的灵活歌手调度;并提出通过条件引导变分自编码器进行互补纹理学习,以捕获与显式控制互补的隐式声学纹理(如空间混响与频谱融合)。实验表明,Tutti在精确的多歌手调度方面表现优异,并显著提升了合唱生成的声学真实感,为复杂的多歌手编排提供了新范式。音频样本可在 https://annoauth123-ctrl.github.io/Tutii_Demo/ 获取。