Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.
翻译:计算优化问题解的雅可比矩阵是机器学习中的核心问题,应用涵盖超参数优化、元学习、优化作为网络层以及数据集蒸馏等。展开微分是一种流行的启发式方法,通过迭代求解器近似原问题解,并沿计算路径进行微分。本研究针对二次型目标函数,对梯度下降法和切比雪夫法展开微分方法进行非渐进收敛速率分析。结果表明,为确保雅可比矩阵收敛,研究者面临两种选择:其一,采用大学习率以获得快速渐近收敛,但需接受算法可能经历任意长的初始收敛阶段;其二,采用小学习率实现即时但较慢的收敛。我们将这种现象称为"展开的诅咒"。最后,本文讨论该方法的开放性问题,包括推导最优展开策略的实用更新规则,以及与索伯列夫正交多项式领域建立新的关联。