Differentiable optimization has received a significant amount of attention due to its foundational role in the domain of machine learning based on neural networks. The existing methods leverages the optimality conditions and implicit function theorem to obtain the Jacobian matrix of the output, which increases the computational cost and limits the application of differentiable optimization. In addition, some non-differentiable constraints lead to more challenges when using prior differentiable optimization layers. This paper proposes a differentiable layer, named Differentiable Frank-Wolfe Layer (DFWLayer), by rolling out the Frank-Wolfe method, a well-known optimization algorithm which can solve constrained optimization problems without projections and Hessian matrix computations, thus leading to a efficient way of dealing with large-scale problems. Theoretically, we establish a bound on the suboptimality gap of the DFWLayer in the context of l1-norm constraints. Experimental assessments demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints. Moreover, it surpasses the baselines in both forward and backward computational speeds.
翻译:可微优化因其在基于神经网络的机器学习领域中的基础性作用而受到广泛关注。现有方法利用最优性条件和隐函数定理来获取输出的雅可比矩阵,这增加了计算成本并限制了可微优化的应用。此外,一些不可微的约束在使用现有可微优化层时带来了更多挑战。本文提出了一种可微层,称为可微分的Frank-Wolfe层(DFWLayer),它通过展开Frank-Wolfe方法实现——该算法是一种著名的优化算法,能够在不进行投影和计算海森矩阵的情况下求解约束优化问题,从而高效处理大规模问题。理论上,我们在l1范数约束下建立了DFWLayer次优性差距的界限。实验评估表明,DFWLayer不仅在解和梯度的精度上具有竞争力,而且始终满足约束条件。此外,它在正向和反向计算速度方面均优于基线方法。