Reusing pre-collected data from different domains is an appealing solution for decision-making tasks, especially when data in the target domain are limited. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as learning task/domain-specific discriminators, representations, or policies. This design philosophy often results in heavy model architectures or task/domain-specific modeling, lacking flexibility. This reality makes us wonder: can we directly bridge the domain gaps universally at the data level, instead of relying on complex downstream cross-domain policy transfer procedures? In this study, we propose the Cross-Domain Trajectory EDiting (xTED) framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data. Edited by adding noises and denoising with the pre-trained diffusion model, source domain trajectories can be transformed to align with target domain properties while preserving original semantic information. This process effectively corrects underlying domain gaps, enhancing state realism and dynamics reliability in source data, and allowing flexible integration with various single-domain and cross-domain downstream policy learning methods. Despite its simplicity, xTED demonstrates superior performance in extensive simulation and real-robot experiments.
翻译:复用不同领域预先收集的数据对于决策任务而言是一种极具吸引力的解决方案,尤其在目标领域数据有限的情况下。现有的跨域策略迁移方法主要致力于学习领域对应关系或修正以促进策略学习,例如学习任务/领域特定的判别器、表示或策略。这种设计理念往往导致模型架构臃肿或需要针对任务/领域进行专门建模,缺乏灵活性。这一现状促使我们思考:能否直接在数据层面实现通用的领域间隙弥合,而非依赖复杂的下游跨域策略迁移流程?本研究提出跨域轨迹编辑(xTED)框架,采用专门设计的扩散模型实现跨域轨迹适应。我们提出的模型架构能有效捕捉状态、动作与奖励间复杂的依赖关系,以及目标数据中的动态模式。通过添加噪声并利用预训练扩散模型进行去噪编辑,源领域轨迹可被转化为符合目标领域特性,同时保留原始语义信息。该过程有效修正了潜在的领域差异,提升了源数据的状态真实性与动态可靠性,并能灵活集成多种单领域及跨领域下游策略学习方法。尽管设计简洁,xTED在大量仿真与真实机器人实验中均展现出卓越性能。