The adaptive learning capabilities seen in biological neural networks are largely a product of the self-modifying behavior emerging from online plastic changes in synaptic connectivity. Current methods in Reinforcement Learning (RL) only adjust to new interactions after reflection over a specified time interval, preventing the emergence of online adaptivity. Recent work addressing this by endowing artificial neural networks with neuromodulated plasticity have been shown to improve performance on simple RL tasks trained using backpropagation, but have yet to scale up to larger problems. Here we study the problem of meta-learning in a challenging quadruped domain, where each leg of the quadruped has a chance of becoming unusable, requiring the agent to adapt by continuing locomotion with the remaining limbs. Results demonstrate that agents evolved using self-modifying plastic networks are more capable of adapting to complex meta-learning learning tasks, even outperforming the same network updated using gradient-based algorithms while taking less time to train.
翻译:生物神经网络中观察到的自适应学习能力,很大程度上源于突触连接在线可塑性变化所产生的自修改行为。当前强化学习方法仅在特定时间间隔后通过反思来调整新交互,阻碍了在线自适应性的涌现。近期研究通过赋予人工神经网络神经调节可塑性来解决这一问题,已证明在基于反向传播训练的简单强化学习任务中能提升性能,但尚未扩展至更大规模问题。本文研究一个具有挑战性的四足机器人领域的元学习问题:其中每个腿部均可能失效,要求智能体通过利用剩余肢体持续运动来适应环境。结果表明,采用自修改可塑性网络演化出的智能体更能适应复杂的元学习任务,其表现甚至优于使用基于梯度的算法更新的相同网络,且训练时间更短。