In speech machine learning, neural network models are typically designed by choosing an architecture with fixed layer sizes and structure. These models are then trained to maximize performance on metrics aligned with the task's objective. While the overall architecture is usually guided by prior knowledge of the task, the sizes of individual layers are often chosen heuristically. However, this approach does not guarantee an optimal trade-off between performance and computational complexity; consequently, post hoc methods such as weight quantization or model pruning are typically employed to reduce computational cost. This occurs because stochastic gradient descent (SGD) methods can only optimize differentiable functions, while factors influencing computational complexity, such as layer sizes and floating-point operations per second (FLOP/s), are non-differentiable and require modifying the model structure during training. We propose a reparameterization technique based on feature noise injection that enables joint optimization of performance and computational complexity during training using SGD-based methods. Unlike traditional pruning methods, our approach allows the model size to be dynamically optimized for a target performance-complexity trade-off, without relying on heuristic criteria to select which weights or structures to remove. We demonstrate the effectiveness of our method through three case studies, including a synthetic example and two practical real-world applications: voice activity detection and audio anti-spoofing. The code related to our work is publicly available to encourage further research.
翻译:在语音机器学习中,神经网络模型通常通过选择具有固定层大小和结构的架构来设计。这些模型随后被训练以最大化与任务目标对齐的指标上的性能。虽然整体架构通常由任务先验知识指导,但各层的大小往往是启发式选择的。然而,这种方法不能保证性能与计算复杂度之间的最优权衡;因此,通常采用事后方法,如权重量化或模型剪枝,以降低计算成本。这是因为随机梯度下降(SGD)方法只能优化可微函数,而影响计算复杂度的因素,如层大小和每秒浮点运算次数(FLOP/s),是不可微的,需要在训练期间修改模型结构。我们提出一种基于特征噪声注入的重参数化技术,使得能够使用基于SGD的方法在训练期间联合优化性能和计算复杂度。与传统剪枝方法不同,我们的方法允许针对目标性能-复杂度权衡动态优化模型大小,而无需依赖启发式标准来选择要移除的权重或结构。我们通过三个案例研究证明了我们方法的有效性,包括一个合成示例和两个实际应用:语音活动检测和音频反欺骗。与我们工作相关的代码已公开提供,以鼓励进一步研究。