CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition

Singing voice beautifying is a novel task that has application value in people's daily life, aiming to correct the pitch of the singing voice and improve the expressiveness without changing the original timbre and content. Existing methods rely on paired data or only concentrate on the correction of pitch. However, professional songs and amateur songs from the same person are hard to obtain, and singing voice beautifying doesn't only contain pitch correction but other aspects like emotion and rhythm. Since we propose a fast and high-fidelity singing voice beautifying system called ConTuner, a diffusion model combined with the modified condition to generate the beautified Mel-spectrogram, where the modified condition is composed of optimized pitch and expressiveness. For pitch correction, we establish a mapping relationship from MIDI, spectrum envelope to pitch. To make amateur singing more expressive, we propose the expressiveness enhancer in the latent space to convert amateur vocal tone to professional. ConTuner achieves a satisfactory beautification effect on both Mandarin and English songs. Ablation study demonstrates that the expressiveness enhancer and generator-based accelerate method in ConTuner are effective.

翻译：歌声美化是一项具有日常生活应用价值的新任务，旨在不改变原始音色和内容的前提下校正歌声音高并提升表现力。现有方法依赖配对数据或仅专注于音高校正。然而，同一个人演唱的专业版与业余版歌曲难以获取，且歌声美化不仅包含音高校正，还涉及情感、节奏等维度。为此，我们提出名为ConTuner的快速高保真歌声美化系统——通过扩散模型结合改良条件生成美化后的梅尔频谱图，其中改良条件由优化后的音高与表现力构成。针对音高校正，我们建立了从MIDI、频谱包络到音高的映射关系。为使业余演唱更具表现力，我们在隐空间中提出表现力增强器，将业余人声转化为专业级音色。ConTuner在中文与英文歌曲上均取得了满意的美化效果。消融实验表明，ConTuner中的表现力增强器与基于生成器的加速方法切实有效。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日