The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification. We navigate trade-offs for both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity wrt different parameter groups and should not be neglected when estimating the Fisher information.
翻译:Fisher信息矩阵可用于刻画神经网络参数空间的局部几何特性。它为理解和优化神经网络提供了深刻的理论见解与实用工具。鉴于其高昂的计算成本,实践者常采用随机估计器并仅评估对角元素。本文研究了两种常用估计器,其精度与样本复杂度取决于相应的方差项。我们推导了方差的上界,并在回归与分类任务中的神经网络上进行了实例化分析。基于解析研究与数值实验,我们探讨了两种估计器的权衡关系。研究发现,方差量值取决于不同参数组的非线性特性,在估计Fisher信息时不应被忽略。