When applying parameter-efficient finetuning via LoRA onto speaker adaptive text-to-speech models, adaptation performance may decline compared to full-finetuned counterparts, especially for out-of-domain speakers. Here, we propose VoiceGuider, a parameter-efficient speaker adaptive text-to-speech system reinforced with autoguidance to enhance the speaker adaptation performance, reducing the gap against full-finetuned models. We carefully explore various ways of strengthening autoguidance, ultimately finding the optimal strategy. VoiceGuider as a result shows robust adaptation performance especially on extreme out-of-domain speech data. We provide audible samples in our demo page.
翻译:在将基于LoRA的参数高效微调应用于说话人自适应文本转语音模型时,其适应性能可能较全参数微调模型有所下降,尤其对于领域外说话人。本文提出VoiceGuider,一种通过自动引导强化的参数高效说话人自适应文本转语音系统,旨在提升说话人适应性能,缩小与全参数微调模型之间的差距。我们系统探索了多种强化自动引导的路径,最终确定了最优策略。实验表明,VoiceGuider展现出鲁棒的适应性能,尤其在极端领域外语音数据上表现突出。我们在演示页面提供了可听样本。