As learning-based robotic controllers are typically trained offline and deployed with fixed parameters, their ability to cope with unforeseen changes during operation is limited. Biologically inspired, this work presents a framework for online Continual Reinforcement Learning that enables automated adaptation during deployment. Building on DreamerV3, a model-based Reinforcement Learning algorithm, the proposed method leverages world model prediction residuals to detect out-of-distribution events and automatically trigger finetuning. Adaptation progress is monitored using both task-level performance signals and internal training metrics, allowing convergence to be assessed without external supervision and domain knowledge. The approach is validated on a variety of contemporary continuous control problems, including a quadruped robot in high-fidelity simulation, and a real-world model vehicle. Relevant metrics and their interpretation are presented and discussed, as well as resulting trade-offs described. The results sketch out how autonomous robotic agents could once move beyond static training regimes toward adaptive systems capable of self-reflection and -improvement during operation, just like their biological counterparts.
翻译:基于学习的机器人控制器通常采用离线训练和固定参数部署的方式,其应对运行过程中未预见变化的能力有限。受生物学机制启发,本研究提出一种在线持续强化学习框架,使系统能够在部署期间实现自主适应。该方法以基于模型的强化学习算法DreamerV3为基础,利用世界模型预测残差检测分布外事件并自动触发微调。通过结合任务级性能信号与内部训练指标来监控适应进程,使得系统无需外部监督和领域知识即可评估收敛状态。本方法在多种现代连续控制问题上得到验证,包括高保真仿真环境中的四足机器人以及真实世界的模型车辆。研究详细呈现并讨论了相关指标及其解释,同时阐述了由此产生的权衡关系。实验结果初步勾勒出自主机器人智能体如何突破静态训练范式,向具备运行期间自我反思与改进能力的自适应系统演进,正如其生物对应物所展现的特性。