The idea of embedding optimization problems into deep neural networks as optimization layers to encode constraints and inductive priors has taken hold in recent years. Most existing methods focus on implicitly differentiating Karush-Kuhn-Tucker (KKT) conditions in a way that requires expensive computations on the Jacobian matrix, which can be slow and memory-intensive. In this paper, we developed a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems (here, specifically in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. Alt-Diff decouples the differentiation procedure into a primal update and a dual update in an alternating way. Accordingly, Alt-Diff substantially decreases the dimensions of the Jacobian matrix especially for optimization with large-scale constraints and thus increases the computational speed of implicit differentiation. We show that the gradients obtained by Alt-Diff are consistent with those obtained by differentiating KKT conditions. In addition, we propose to truncate Alt-Diff to further accelerate the computational speed. Under some standard assumptions, we show that the truncation error of gradients is upper bounded by the same order of variables' estimation error. Therefore, Alt-Diff can be truncated to further increase computational speed without sacrificing much accuracy. A series of comprehensive experiments validate the superiority of Alt-Diff.
翻译:近年来,将优化问题作为优化层嵌入深度神经网络以编码约束和归纳先验的思想逐渐受到关注。现有方法大多专注于隐式微分Karush-Kuhn-Tucker(KKT)条件,这需要对Jacobian矩阵进行高成本计算,过程缓慢且内存占用大。本文提出了一种名为交替微分(Alt-Diff)的新框架,能够以快速且递归的方式对优化问题(具体而言,是指带有多面体约束的凸优化问题)进行微分。Alt-Diff通过交替方式将微分过程解耦为原始更新和对偶更新,从而显著降低了Jacobian矩阵的维度(尤其适用于大规模约束优化),进而提升了隐式微分的计算速度。我们证明,Alt-Diff获得的梯度与微分KKT条件所得结果一致。此外,我们提出对Alt-Diff进行截断以进一步加速计算。在标准假设下,我们证明梯度的截断误差由变量估计误差的同阶量界定。因此,Alt-Diff可通过截断在几乎不牺牲精度的前提下进一步提升计算速度。一系列全面实验验证了Alt-Diff的优越性。