This paper considers the reliability of automatic differentiation (AD) for neural networks involving the nonsmooth MaxPool operation. We investigate the behavior of AD across different precision levels (16, 32, 64 bits) and convolutional architectures (LeNet, VGG, and ResNet) on various datasets (MNIST, CIFAR10, SVHN, and ImageNet). Although AD can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations (such as MaxPool and ReLU). On the other hand, in practice, AD operates with floating-point numbers (not real numbers), and there is, therefore, a need to explore subsets on which AD can be numerically incorrect. These subsets include a bifurcation zone (where AD is incorrect over reals) and a compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using SGD for the training process, we study the impact of different choices of the nonsmooth Jacobian for the MaxPool function on the precision of 16 and 32 bits. These findings suggest that nonsmooth MaxPool Jacobians with lower norms help maintain stable and efficient test accuracy, whereas those with higher norms can result in instability and decreased performance. We also observe that the influence of MaxPool's nonsmooth Jacobians on learning can be reduced by using batch normalization, Adam-like optimizers, or increasing the precision level.
翻译:本文探讨了包含非光滑MaxPool操作的神经网络中自动微分(AD)的可靠性。我们研究了不同精度级别(16位、32位、64位)和卷积架构(LeNet、VGG和ResNet)在各种数据集(MNIST、CIFAR10、SVHN和ImageNet)上AD的行为表现。尽管AD可能存在错误,但近期研究表明,即使存在非光滑操作(如MaxPool和ReLU),AD在几乎所有点上仍与导数保持一致。然而在实际应用中,AD处理的是浮点数(而非实数),因此有必要探究AD在数值上可能出错的子集。这些子集包括分岔区(AD在实数域上不正确)和补偿区(AD在浮点数上不正确但在实数域上正确)。通过使用SGD进行训练,我们研究了MaxPool函数不同非光滑雅可比矩阵选择对16位和32位精度的影响。研究结果表明,具有较低范数的非光滑MaxPool雅可比矩阵有助于保持稳定高效的测试精度,而具有较高范数的雅可比矩阵可能导致不稳定性和性能下降。我们还观察到,通过使用批量归一化、类Adam优化器或提高精度级别,可以减少MaxPool非光滑雅可比矩阵对学习过程的影响。