It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.
翻译:牛顿法中的线搜索技术若仅使用单个浮点数可能不充分。与梯度同尺寸的列向量可能优于普通浮点数——它能以不同速率加速各梯度元素。此外,与海森矩阵同阶的方阵或有助于修正海森矩阵。Chiang 采用介于列向量与方阵之间的对角矩阵来加速梯度,并进一步提出名为"二次梯度"的快速梯度变体。本文提出构建新版二次梯度的新途径。该新版二次梯度不满足固定海森牛顿法的收敛条件,但实验结果表明,其在收敛速度上有时优于原始版本。同时,Chiang 推测海森矩阵与一阶梯度下降法的学习率之间存在关联。我们证明了浮点数 $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ 可作为梯度方法的优良学习率,其中 $\epsilon$ 为避免除零的常数,$\lambda_i$ 为海森矩阵的特征值。