This paper considers the reliability of automatic differentiation (AD) for neural networks involving the nonsmooth MaxPool operation. We investigate the behavior of AD across different precision levels (16, 32, 64 bits) and convolutional architectures (LeNet, VGG, and ResNet) on various datasets (MNIST, CIFAR10, SVHN, and ImageNet). Although AD can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations (such as MaxPool and ReLU). On the other hand, in practice, AD operates with floating-point numbers (not real numbers), and there is, therefore, a need to explore subsets on which AD can be numerically incorrect. These subsets include a bifurcation zone (where AD is incorrect over reals) and a compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using SGD for the training process, we study the impact of different choices of the nonsmooth Jacobian for the MaxPool function on the precision of 16 and 32 bits. These findings suggest that nonsmooth MaxPool Jacobians with lower norms help maintain stable and efficient test accuracy, whereas those with higher norms can result in instability and decreased performance. We also observe that the influence of MaxPool's nonsmooth Jacobians on learning can be reduced by using batch normalization, Adam-like optimizers, or increasing the precision level.
翻译:摘要:本文研究了涉及非光滑MaxPool操作的神经网络中自动微分(AD)的可靠性。我们探讨了AD在不同精度水平(16、32、64位)及不同卷积架构(LeNet、VGG和ResNet)下,针对多种数据集(MNIST、CIFAR10、SVHN和ImageNet)的行为表现。尽管AD可能出现错误,但近期研究表明,即使存在非光滑操作(如MaxPool和ReLU),其导数几乎处处与真实导数一致。另一方面,实际应用中AD基于浮点数(而非实数)运算,因此需要探究AD可能产生数值误差的子集。这些子集包括分岔区(AD在实数域下出现错误)和补偿区(AD在浮点数域下出现错误但在实数域下正确)。我们采用SGD进行训练,研究了MaxPool函数不同非光滑Jacobian选择对16位和32位精度的影响。研究结果表明,范数较小的非光滑MaxPoolJacobian有助于保持稳定且高效的测试精度,而范数较大的Jacobian则可能导致不稳定及性能下降。我们还观察到,通过使用批归一化、Adam类优化器或提高精度水平,可降低MaxPool非光滑Jacobian对学习过程的影响。