基于风格辨识循环一致生成对抗网络的仿真到现实迁移：通过视觉域适应实现机械臂零样本部署 (Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation)

Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation

翻译：基于风格辨识循环一致生成对抗网络的仿真到现实迁移：通过视觉域适应实现机械臂零样本部署

Lucía Güitta-López,Lionel Güitta-López,Jaime Boal,Álvaro Jesús López-López

The sample efficiency challenge in Deep Reinforcement Learning (DRL) compromises its industrial adoption due to the high cost and time demands of real-world training. Virtual environments offer a cost-effective alternative for training DRL agents, but the transfer of learned policies to real setups is hindered by the sim-to-real gap. Achieving zero-shot transfer, where agents perform directly in real environments without additional tuning, is particularly desirable for its efficiency and practical value. This work proposes a novel domain adaptation approach relying on a Style-Identified Cycle Consistent Generative Adversarial Network (StyleID-CycleGAN or SICGAN), an original Cycle Consistent Generative Adversarial Network (CycleGAN) based model. SICGAN translates raw virtual observations into real-synthetic images, creating a hybrid domain for training DRL agents that combines virtual dynamics with real-like visual inputs. Following virtual training, the agent can be directly deployed, bypassing the need for real-world training. The pipeline is validated with two distinct industrial robots in the approaching phase of a pick-and-place operation. In virtual environments agents achieve success rates of 90 to 100\%, and real-world deployment confirms robust zero-shot transfer (i.e., without additional training in the physical environment) with accuracies above 95\% for most workspace regions. We use augmented reality targets to improve the evaluation process efficiency, and experimentally demonstrate that the agent successfully generalizes to real objects of varying colors and shapes, including LEGO\textsuperscript{\textregistered}~cubes and a mug. These results establish the proposed pipeline as an efficient, scalable solution to the sim-to-real problem.

翻译：深度强化学习（DRL）的样本效率问题因其在现实世界训练中的高成本与时间需求而制约了其工业应用。虚拟环境为训练DRL智能体提供了一种经济高效的替代方案，但所学策略向真实场景的迁移受到仿真-现实差异的阻碍。实现零样本迁移——即智能体无需额外调优即可直接在真实环境中执行任务——因其效率与实用价值而备受关注。本研究提出一种新颖的域适应方法，该方法基于一种原创的循环一致生成对抗网络（CycleGAN）模型——风格辨识循环一致生成对抗网络（StyleID-CycleGAN或SICGAN）。SICGAN将原始虚拟观测数据转换为真实-合成图像，从而创建一个混合域用于训练DRL智能体，该域结合了虚拟动力学与类真实视觉输入。经过虚拟训练后，智能体可直接部署，无需进行真实世界训练。该流程在两种不同的工业机器人上通过拾放操作的接近阶段进行了验证。在虚拟环境中，智能体实现了90%至100%的成功率；真实世界部署证实了稳健的零样本迁移（即无需在物理环境中进行额外训练），在大部分工作空间区域准确率超过95%。我们采用增强现实目标以提高评估过程效率，并通过实验证明智能体能够成功泛化至不同颜色和形状的真实物体，包括LEGO\textsuperscript{\textregistered}~积木和马克杯。这些结果表明，所提出的流程是解决仿真-现实问题的一种高效、可扩展的解决方案。