Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose. Videos and more information can be viewed at project webpage: https://sites.google.com/view/promptadapt.
翻译:在现实应用中部署机器学习算法执行机器人任务面临一个核心挑战:克服训练环境与部署环境之间的域差异。这对于利用高维图像作为输入的视觉运动策略尤为困难,特别是当这些图像通过仿真生成时。解决此问题的常用方法是域随机化,其旨在扩大训练分布的覆盖范围以涵盖测试时的分布。然而,只有当域随机化能够涵盖测试分布的实际偏移时,该方法才有效。我们采取了一种不同的方法,即利用单次演示(一个提示)来学习适应测试目标环境的策略。我们提出的框架PromptAdapt,利用Transformer架构对序列数据的建模能力,学习以演示为条件的视觉策略,从而实现对不同于训练域的目标域进行上下文适应。我们在仿真和真实环境中的实验表明,PromptAdapt是一种强大的域适应策略,在包括光照、颜色、纹理和相机姿态变化在内的一系列域偏移下,其性能大幅优于基线方法。视频及更多信息请访问项目网页:https://sites.google.com/view/promptadapt。