The latest advances in deep learning have facilitated the development of highly accurate monocular depth estimation models. However, when training a monocular depth estimation network, practitioners and researchers have observed not a number (NaN) loss, which disrupts gradient descent optimization. Although several practitioners have reported the stochastic and mysterious occurrence of NaN loss that bothers training, its root cause is not discussed in the literature. This study conducted an in-depth analysis of NaN loss during training a monocular depth estimation network and identified three types of vulnerabilities that cause NaN loss: 1) the use of square root loss, which leads to an unstable gradient; 2) the log-sigmoid function, which exhibits numerical stability issues; and 3) certain variance implementations, which yield incorrect computations. Furthermore, for each vulnerability, the occurrence of NaN loss was demonstrated and practical guidelines to prevent NaN loss were presented. Experiments showed that both optimization stability and performance on monocular depth estimation could be improved by following our guidelines.
翻译:深度学习的最新进展推动了高精度单目深度估计模型的发展。然而,在训练单目深度估计网络时,研究者和从业者观察到非数值(NaN)损失,这会破坏梯度下降优化。尽管多位从业者报告了这种随机且神秘的NaN损失现象,但文献中并未探讨其根本原因。本研究对训练单目深度估计网络过程中出现的NaN损失进行了深入分析,识别出三类导致NaN损失的脆弱性:1)使用平方根损失导致梯度不稳定;2)对数Sigmoid函数存在数值稳定性问题;3)某些方差实现会产生错误计算。此外,针对每类脆弱性,我们展示了NaN损失的发生机制,并提出了预防NaN损失的实用指南。实验表明,遵循我们的指南能够同时提升单目深度估计的优化稳定性与性能表现。