Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.
翻译:在所有围绕循环学习设置最小值的向量场中,梯度场因其爆炸和消失的更新而成为优化的不佳选择,除了高效的可计算性外几乎毫无优势。我们试图在物理模拟的背景下改进这种次优实践,其中通过许多展开的时间步反向传播反馈被认为对获取时间一致行为至关重要。我们提出的替代向量场遵循两个原则:与神经网络不同,物理模拟器具有平衡的梯度流;对反向传播过程的某些修改会保留原始最小值的位点。由于对反向传播的任何修改都会解耦前向和反向传播,梯度场的无旋转特性因而丧失。因此,我们讨论了使用此类旋转向量场进行优化的负面后果,以及如何抵消这些影响。我们的最终程序可通过一系列梯度停止和逐分量比较操作轻松实现,这些操作不会对可扩展性产生负面影响。我们在三个控制问题上的实验表明,尤其是随着每个任务复杂性的增加,梯度产生的不平衡更新不再能提供必要的精确控制信号,而我们的方法仍能解决这些任务。我们的代码可在 https://github.com/tum-pbs/StableBPTT 找到。