In "Large Associative Memory Problem in Neurobiology and Machine Learning," Dmitry Krotov and John Hopfield introduced a general technique for the systematic construction of neural ordinary differential equations with non-increasing energy or Lyapunov function. We study this energy function and identify that it is vulnerable to the problem of dead neurons. Each point in the state space where the neuron dies is contained in a non-compact region with constant energy. In these flat regions, energy function alone does not completely determine all degrees of freedom and, as a consequence, can not be used to analyze stability or find steady states or basins of attraction. We perform a direct analysis of the dynamical system and show how to resolve problems caused by flat directions corresponding to dead neurons: (i) all information about the state vector at a fixed point can be extracted from the energy and Hessian matrix (of Lagrange function), (ii) it is enough to analyze stability in the range of Hessian matrix, (iii) if steady state touching flat region is stable the whole flat region is the basin of attraction. The analysis of the Hessian matrix can be complicated for realistic architectures, so we show that for a slightly altered dynamical system (with the same structure of steady states), one can derive a diverse family of Lyapunov functions that do not have flat regions corresponding to dead neurons. In addition, these energy functions allow one to use Lagrange functions with Hessian matrices that are not necessarily positive definite and even consider architectures with non-symmetric feedforward and feedback connections.
翻译:在《神经生物学与机器学习中的大型联想记忆问题》一文中,Dmitry Krotov 与 John Hopfield 提出了一种通用技术,用于系统构建具有非递增能量或李雅普诺夫函数的神经常微分方程。我们研究了该能量函数,发现其易受死亡神经元问题的影响。状态空间中每个神经元死亡的点都包含在一个具有恒定能量的非紧致区域内。在这些平坦区域中,仅凭能量函数无法完全确定所有自由度,因此不能用于分析稳定性或寻找稳态或吸引域。我们直接对该动力系统进行分析,展示了如何解决由死亡神经元对应的平坦方向引起的问题:(i)固定点处状态向量的所有信息均可从能量函数和(拉格朗日函数的)海森矩阵中提取;(ii)仅需在海森矩阵的范围内分析稳定性;(iii)若接触平坦区域的稳态是稳定的,则整个平坦区域即为吸引域。对于实际架构,海森矩阵的分析可能较为复杂,因此我们证明:对于一个略微修改的动力系统(具有相同的稳态结构),可以推导出不存在死亡神经元对应平坦区域的多样化李雅普诺夫函数族。此外,这些能量函数允许使用海森矩阵不必正定的拉格朗日函数,甚至可以考虑具有非对称前馈与反馈连接的架构。