The Euclidean distance between wavelet scattering transform coefficients (known as paths) provides informative gradients for perceptual quality assessment of deep inverse problems in computer vision, speech, and audio processing. However, these transforms are computationally expensive when employed as differentiable loss functions for stochastic gradient descent due to their numerous paths, which significantly limits their use in neural network training. Against this problem, we propose "Scattering transform with Random Paths for machine Learning" (SCRAPL): a stochastic optimization scheme for efficient evaluation of multivariable scattering transforms. We implement SCRAPL for the joint time-frequency scattering transform (JTFS) which demodulates spectrotemporal patterns at multiple scales and rates, allowing a fine characterization of intermittent auditory textures. We apply SCRAPL to differentiable digital signal processing (DDSP), specifically, unsupervised sound matching of a granular synthesizer and the Roland TR-808 drum machine. We also propose an initialization heuristic based on importance sampling, which adapts SCRAPL to the perceptual content of the dataset, improving neural network convergence and evaluation performance. We make our code and audio samples available and provide SCRAPL as a Python package.
翻译:小波散射变换系数(称为路径)之间的欧氏距离为计算机视觉、语音和音频处理中的深度逆问题感知质量评估提供了信息丰富的梯度。然而,由于路径数量众多,这些变换在作为随机梯度下降的可微损失函数使用时计算成本高昂,这极大地限制了它们在神经网络训练中的应用。针对此问题,我们提出了“用于机器学习的随机路径散射变换”(SCRAPL):一种用于高效评估多变量散射变换的随机优化方案。我们为联合时频散射变换(JTFS)实现了SCRAPL,该变换可在多个尺度和速率下解调时频谱模式,从而实现对间歇性听觉纹理的精细表征。我们将SCRAPL应用于可微分数字信号处理(DDSP),特别是颗粒合成器和Roland TR-808鼓机的无监督声音匹配。我们还提出了一种基于重要性采样的初始化启发式方法,使SCRAPL能够适应数据集的感知内容,从而改善神经网络的收敛性和评估性能。我们公开了代码和音频样本,并将SCRAPL打包为Python软件包提供。