Zero-Shot Mono-to-Binaural Speech Synthesis

We present ZeroBAS, a neural method to synthesize binaural audio from monaural audio recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural audio synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on the standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset. Our results highlight the potential of pretrained generative audio models and zero-shot learning to unlock robust binaural audio synthesis.

翻译：我们提出ZeroBAS，一种无需任何双声道数据训练即可从单声道音频录音与位置信息合成双声道音频的神经方法。据我们所知，这是首个公开的零样本神经单声道至双声道音频合成方法。具体而言，我们证明基于声源位置的无参数几何时间扭曲与幅度缩放足以获得初始双声道合成结果，该结果可通过迭代应用预训练的降噪声码器进行优化。此外，我们发现该方法能够实现跨房间条件的泛化能力，为此我们引入新数据集TUT Mono-to-Binaural进行评估，在未见条件下测试最先进的单声道至双声道合成方法。我们的零样本方法在标准单声道至双声道数据集上的感知性能与监督学习方法相当，甚至在我们分布外TUT Mono-to-Binaural数据集上超越后者。研究结果凸显了预训练生成音频模型与零样本学习在实现鲁棒双声道音频合成方面的潜力。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日