Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.
翻译:计算优化问题解函数的雅可比矩阵是机器学习中的核心问题,其应用涵盖超参数优化、元学习、作为网络层的优化层以及数据集蒸馏等领域。解卷微分作为一种常用启发式方法,通过迭代求解器近似解并沿计算路径进行微分。本文针对二次目标函数,为梯度下降法和切比雪夫法提供了该方法的非渐近收敛速率分析。研究表明,为确保雅可比矩阵的收敛性,我们需在以下两种策略中权衡:1)选择较大学习率以获得快速渐近收敛,但可能导致算法存在任意长的初始烧机阶段;2)选择较小学习率以实现即时但较慢的收敛。我们将此现象称为解卷的诅咒。最后,我们探讨了该方法相关的开放性问题,包括推导最优解卷策略的实用更新规则,以及建立与索伯列夫正交多项式领域的新联系。