Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.
翻译:离线强化学习能够在无需风险性在线交互的情况下高效获取策略,然而基于静态数据集训练的策略在执行器故障等动作空间扰动下依然脆弱。本研究提出一种离线到在线框架:先在洁净数据上训练策略,随后进行对抗性微调——通过向执行动作注入扰动以诱导补偿行为,从而提升系统韧性。性能感知课程机制进一步通过指数移动平均信号动态调整训练过程中的扰动概率,在整个学习过程中平衡鲁棒性与稳定性。在连续控制运动任务上的实验表明,所提方法相较于纯离线基线持续提升鲁棒性,且比从头训练收敛更快。当微调条件与评估条件匹配时,系统对动作空间扰动展现出最强的鲁棒性,而自适应课程策略有效缓解了线性课程策略中观察到的标称性能退化问题。总体而言,研究结果表明对抗性微调能够在不确定环境下实现自适应鲁棒控制,弥合了离线效率与在线适应性之间的鸿沟。