In this paper, we design two nonlinear dynamical systems-inspired discriminators -- the Multi-Scale Recurrence Discriminator (MSRD) and the Multi-Resolution Lyapunov Discriminator (MRLD) -- to \textit{explicitly} model the inherent deterministic chaos of speech. MSRD is designed based on Recurrence representations to capture self-similarity dynamics. MRLD is designed based on Lyapunov exponents to capture nonlinear fluctuations and sensitivity to initial conditions. Through extensive design optimization and the use of depthwise-separable convolutions in the discriminators, our framework surpasses prior AP-BWE model with a 44x reduction in the discriminator parameter count \textbf{($\sim$ 22M vs $\sim$ 0.48M)}. To the best of our knowledge, for the first time, this paper demonstrates how BWE can be supervised by the subtle non-linear chaotic physics of voiced sound production to achieve a significant reduction in the discriminator size.
翻译:本文设计了两种受非线性动力系统启发的判别器——多尺度递归判别器(MSRD)与多分辨率李雅普诺夫判别器(MRLD)——以显式建模语音固有的确定性混沌特性。MSRD基于递归表示设计,用于捕捉自相似动力学特征;MRLD基于李雅普诺夫指数设计,用于捕捉非线性波动及对初始条件的敏感性。通过深入的设计优化并在判别器中采用深度可分离卷积,我们的框架超越了先前的AP-BWE模型,同时将判别器参数量降低了44倍(约22M对比约0.48M)。据我们所知,本文首次证明了如何通过浊音产生的微妙非线性混沌物理机制来监督带宽扩展任务,从而实现判别器规模的显著缩减。