RoHIL: Robust Human-in-the-Loop Robotic Reinforcement Learning Against Illumination Variations

Human-in-the-loop reinforcement learning systems achieve near-perfect success on the workstation where they are trained, but collapse when the same robot is moved to a workstation a few meters away due to shifts in the visual input distribution caused by new lamp positions and window light. Re-collecting demonstrations and re-running HIL on every workstation is incompatible with deployment, and naively fine-tuning on shifted-light data triggers catastrophic forgetting of the source workstation. To close this cross-domain gap, we present RoHIL, an offline fine-tuning framework that uses no extra real-robot interaction. RoHIL combines (i) a world-model-based image relighter that re-synthesises the visual stream of source-workstation trajectories under multiple virtual HDRI environments, leaving actions and rewards real; (ii) Illumination-Retention Replay (IRR), a data-level anti-forgetting mechanism that interleaves relit adaptation transitions with original-light retention transitions to preserve source-workstation Bellman coverage; and (iii) an anchored Bellman-actor regulariser that constrains representation and policy drift from the original source-workstation policy. Across four real-robot manipulation tasks under significant cross-workstation illumination variations, RoHIL substantially improves shifted-light performance where standard HIL-RL collapses, while preserving source-workstation performance, eliminating the need to re-collect data and retrain for every new workstation and environment. Project page: https://anonymous4365.github.io/RoHIL/

翻译：人在环路强化学习系统在训练时所在的工作站上能实现近乎完美的成功率，但当同一机器人被移至仅数米之遥的另一工作站时，由于新灯具位置和窗户光线导致的视觉输入分布偏移，系统性能便会崩溃。在每个工作站上重新采集演示样本并重跑人在环路流程与部署要求相悖，而简单地对偏移光照数据进行微调则会引发对源工作站知识的灾难性遗忘。为弥合这一跨域鸿沟，我们提出RoHIL——一种无需额外真实机器人交互的离线微调框架。RoHIL融合了：(i) 基于世界模型的图像重光照器，能在多种虚拟HDRI环境下重新合成源工作站轨迹的视觉流，同时保留真实的动作与奖励；(ii) 光照保持重放机制（IRR），一种数据层面的抗遗忘机制，通过交替使用重光照适应迁移片段与原始光照保持迁移片段来维护源工作站的贝尔曼覆盖范围；(iii) 锚定贝尔曼演员正则化器，用于约束表征与策略相对于原始源工作站策略的漂移。在四项跨工作站光照变化显著的现实机器人操作任务中，RoHIL显著提升了标准人在环路强化学习会崩溃的偏移光照场景下的性能，同时保留了源工作站性能，从而消除了为每个新工作站与环境重新采集数据及重新训练的需求。项目主页：https://anonymous4365.github.io/RoHIL/