This paper proposes Hamiltonian Learning, a novel unified framework for learning with neural networks "over time", i.e., from a possibly infinite stream of data, in an online manner, without having access to future information. Existing works focus on the simplified setting in which the stream has a known finite length or is segmented into smaller sequences, leveraging well-established learning strategies from statistical machine learning. In this paper, the problem of learning over time is rethought from scratch, leveraging tools from optimal control theory, which yield a unifying view of the temporal dynamics of neural computations and learning. Hamiltonian Learning is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives. The proposed framework is showcased by experimentally proving how it can recover gradient-based learning, comparing it to out-of-the box optimizers, and describing how it is flexible enough to switch from fully-local to partially/non-local computational schemes, possibly distributed over multiple devices, and BackPropagation without storing activations. Hamiltonian Learning is easy to implement and can help researches approach in a principled and innovative manner the problem of learning over time.
翻译:本文提出了一种新颖的统一框架——哈密顿学习,用于在“时间维度”上利用神经网络进行学习,即从可能无限的数据流中以在线方式学习,无需访问未来信息。现有研究主要关注简化场景,其中数据流具有已知的有限长度或被分割为较短的序列,并依赖于统计机器学习中成熟的学习策略。本文从最优控制理论出发,重新思考时间维度上的学习问题,该理论为神经计算与学习的时序动态提供了统一视角。哈密顿学习基于以下微分方程:(i) 无需外部软件求解器即可进行积分;(ii) 推广了前馈网络与循环网络中基于梯度的经典学习范式;(iii) 为新的研究方向开辟了可能性。本文通过实验验证了该框架如何恢复基于梯度的学习,将其与即用型优化器进行比较,并描述了其如何灵活地在完全局部、部分/非局部计算方案(可能分布于多个设备)以及无需存储激活值的反向传播之间切换,从而展示了该框架的实用性。哈密顿学习易于实现,有助于研究者以系统且创新的方式处理时间维度上的学习问题。