Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on various tasks demonstrate the superiority of ReinMax over the state of the art. Implementations are released at https://github.com/microsoft/ReinMax.
翻译:反向传播,作为深度学习的基石,仅限于计算连续变量的梯度。这一限制对涉及离散潜变量的问题构成了挑战。为解决该问题,我们提出了一种近似梯度参数的新方法,该方法用于生成离散潜变量。首先,我们检验了广泛使用的直通启发式方法,并证明其可作为梯度的首阶近似。基于研究发现,我们提出ReinMax方法,通过整合Heun方法(一种求解常微分方程的二阶数值方法)实现二阶精度。ReinMax无需海森矩阵或其他二阶导数,因此计算开销可忽略不计。在多种任务上的大量实验结果表明,ReinMax优于现有最先进方法。相关实现已发布于https://github.com/microsoft/ReinMax。