The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.
翻译:前向-前向(FF)算法是一种近期提出的神经网络学习程序,它采用两次前向传播替代传统反向传播中的前向与反向传播过程。然而,FF算法仍主要局限于监督学习场景,在可更自然产生学习信号的领域(如强化学习)存在空白。受FF算法利用层活动统计量构建优良度函数的启发,本研究提出动作条件均方根Q函数(ARQ),这是一种新颖的价值估计方法,通过应用优良度函数和动作条件,结合时序差分学习实现局部强化学习。尽管该方法具有简洁性和生物学合理性,但在MinAtar和DeepMind控制套件基准测试中,其性能超越了当前最先进的无反向传播局部强化学习方法,同时在多数任务上优于使用反向传播训练的算法。代码详见https://github.com/agentic-learning-ai-lab/arq。