Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed to minimize cost functions. In these algorithms, tunable parameters, such as step sizes or conjugate parameters, play a crucial role in determining key performance metrics, like runtime and solution quality. In this work, we introduce a framework that models algorithm selection as a statistical learning problem, and thus learning complexity can be estimated by the pseudo-dimension of the algorithm group. We first propose a new cost measure for unconstrained optimization algorithms, inspired by the concept of primal-dual integral in mixed-integer linear programming. Based on the new cost measure, we derive an improved upper bound for the pseudo-dimension of gradient descent algorithm group by discretizing the set of step size configurations. Moreover, we generalize our findings from gradient descent algorithm to the conjugate gradient algorithm group for the first time, and prove the existence a learning algorithm capable of probabilistically identifying the optimal algorithm with a sufficiently large sample size.
翻译:梯度下降(GD)与共轭梯度(CG)方法是求解无约束优化问题最高效的迭代算法之一,尤其在机器学习和统计建模领域,它们被广泛用于最小化代价函数。在这些算法中,可调参数(如步长或共轭参数)对决定关键性能指标(如运行时间和解的质量)起着至关重要的作用。本文提出一个将算法选择建模为统计学习问题的框架,从而可通过算法组的伪维度来估计学习复杂度。我们首先受混合整数线性规划中原对偶积分概念的启发,为无约束优化算法提出一种新的代价度量。基于这一新度量,我们通过离散化步长配置集合,推导出梯度下降算法组伪维度的改进上界。此外,我们首次将研究结果从梯度下降算法推广至共轭梯度算法组,并证明存在一种学习算法,能够在样本量足够大的情况下以概率方式识别出最优算法。