In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals. We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer. The proposed approach offers a promising direction for improving unsupervised parameter estimation in neural audio applications.
翻译:在神经音频信号处理中,音高条件化已被用于增强合成器的性能。然而,当使用标准音频-音频重建损失时,联合训练音高估计器和合成器面临挑战,导致依赖外部音高追踪器。为解决此问题,我们提出一种基于最优传输理论的谱损失函数,该函数可最小化谱能量的位移。我们通过将谐波模板拟合至谐波信号的无监督自编码任务验证该方法,使用轻量级编码器联合估计基频和谐波振幅,并通过可微谐波合成器重建信号。该方案为改进神经音频应用中无监督参数估计提供了有前景的研究方向。