Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.
翻译:学习能够泛化到未见过的转移动力学的领域自适应策略,仍然是基于学习的控制中的一个根本性挑战。通过领域表示学习来捕获领域特定信息,从而实现领域感知决策,已经取得了实质性进展。我们分析了通过动力学预测学习领域表示的过程,发现选择与当前步骤相邻的上下文会导致学习到的表示将静态领域信息与变化的动力学特性纠缠在一起。这种混合会混淆条件策略,从而限制零样本适应能力。为了应对这一挑战,我们提出了DADP(领域自适应扩散策略),它通过无监督解耦和领域感知扩散注入实现鲁棒适应。首先,我们引入了滞后上下文动力学预测策略,该策略基于历史偏移上下文对未来状态估计进行条件化;通过增加这种时间间隔,我们通过过滤掉瞬态特性,以无监督方式解耦出静态领域表示。其次,我们通过偏置先验分布和重新表述扩散目标,将学习到的领域表示直接集成到生成过程中。在涵盖运动与操作的多个具有挑战性的基准测试上进行的大量实验表明,DADP相比先前方法具有优越的性能和泛化能力。更多可视化结果可在 https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/ 上获取。