Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing term into the optimization algorithm. However, due to its discontinuous nature and lack of differentiability the condition number is not suitable for a gradient descent approach. This paper introduces a novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers. In particular, we derive a formula for the gradient of this regularizer which can be easily implemented and integrated into existing optimization algorithms. We show the advantages of this approach for noisy classification and denoising of MNIST images.
翻译:在机器学习模型中保持数值稳定性对其可靠性和性能至关重要。维持网络层稳定性的一种方法是将权重矩阵的条件数作为正则化项集成到优化算法中。然而,由于条件数的不连续性和不可微性,它并不适用于梯度下降方法。本文提出了一种新颖的正则化器,该正则化器在几乎所有点都可微,并能促进低条件数矩阵的生成。特别地,我们推导了该正则化器梯度的计算公式,该公式易于实现并可集成到现有优化算法中。我们展示了该方法在MNIST图像噪声分类和去噪任务中的优势。