We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods.
翻译:我们提出ISAAC(基于输入的近似曲率)方法,这是一种利用选择性二阶信息对梯度进行条件化的新方法,在批大小小于神经元数量的假设下,其计算开销渐近消失。研究表明,仅基于各层输入即可计算优质条件矩阵,且无需显著增加计算负担。该方法即使在随机小批量训练场景下也能实现高效训练,从而展现出与一阶及二阶方法相媲美的竞争力。