Vocal Timbre Effects with Differentiable Digital Signal Processing

from arxiv, Accepted for publication in Proc DAFx 2023, Copenhagen, Denmark. Sound Examples: https://dsuedholt.github.io/ddsp-vocal-effects/ Code: https://github.com/dsuedholt/ddsp_xsynth

We explore two approaches to creatively altering vocal timbre using Differentiable Digital Signal Processing (DDSP). The first approach is inspired by classic cross-synthesis techniques. A pretrained DDSP decoder predicts a filter for a noise source and a harmonic distribution, based on pitch and loudness information extracted from the vocal input. Before synthesis, the harmonic distribution is modified by interpolating between the predicted distribution and the harmonics of the input. We provide a real-time implementation of this approach in the form of a Neutone model. In the second approach, autoencoder models are trained on datasets consisting of both vocal and instrument training data. To apply the effect, the trained autoencoder attempts to reconstruct the vocal input. We find that there is a desirable "sweet spot" during training, where the model has learned to reconstruct the phonetic content of the input vocals, but is still affected by the timbre of the instrument mixed into the training data. After further training, that effect disappears. A perceptual evaluation compares the two approaches. We find that the autoencoder in the second approach is able to reconstruct intelligible lyrical content without any explicit phonetic information provided during training.

翻译：我们探索了两种借助可微分数字信号处理（DDSP）创造性改变人声音色的方法。第一种方法受经典交叉合成技术启发：基于从人声输入中提取的音高和响度信息，预训练的DDSP解码器预测噪声源的滤波器及谐波分布。在合成前，通过将预测的谐波分布与输入谐波进行插值调整该分布。我们以Neutone模型形式提供了该方法的实时实现。第二种方法中，自编码器模型在包含人声与乐器训练数据的混合数据集上训练。为应用效果，训练后的自编码器尝试重构人声输入。研究发现，在训练过程中存在一个理想的"甜区"：此时模型已学会重构输入人声的音素内容，但仍受训练数据中混合乐器音色的影响。进一步训练后，该效应消失。通过感知评估对两种方法进行比较，发现第二种方法中的自编码器能在未提供显式音素信息的条件下，重构出可辨识的歌词内容。

相关内容

Signal Processing

关注 3

信号处理期刊采用了理论与实践的各个方面的信号处理。它以原始研究工作，教程和评论文章以及实际发展情况为特色。它旨在将知识和经验快速传播给从事信号处理研究，开发或实际应用的工程师和科学家。该期刊涵盖的主题领域包括：信号理论；随机过程; 检测和估计；光谱分析；过滤；信号处理系统；软件开发；图像处理; 模式识别; 光信号处理；数字信号处理; 多维信号处理；通信信号处理；生物医学信号处理；地球物理和天体信号处理；地球资源信号处理；声音和振动信号处理；数据处理; 遥感; 信号处理技术；雷达信号处理；声纳信号处理；工业应用；新的应用程序。官网地址：http://dblp.uni-trier.de/db/journals/sigpro/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日