Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.
翻译:将离散求解器嵌入为可微分层,为现代深度学习架构赋予了组合表达能力和离散推理能力。这些求解器的导数为零或未定义,因此寻找有意义的替代方案对于基于梯度的有效学习至关重要。先前的研究依赖于通过输入扰动对求解器进行平滑处理、将求解器松弛为连续问题,或采用通常需要额外求解器调用、引入超参数或影响性能的技术对损失景观进行插值。我们提出了一种基于原则的方法,利用离散解空间的几何特性,在反向传播中将求解器视为负恒等映射,并提供了理论证明。实验表明,这种直接且无超参数的方法在多个实验场景(如离散采样器的反向传播、深度图匹配和图像检索)中能够与先前更复杂的方法相媲美。此外,我们用一个通用的正则化流程替代了先前提出的特定问题和标签相关的边界,该流程可防止损失塌陷并提升鲁棒性。