Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where the collection of expert data is costly. Existing methods are either supervised, relying on proxy tasks and explicit alignment, or unsupervised, aligning distributions without paired data, but often unstable. We introduce the Semi-Supervised CDIL (SS-CDIL) setting and propose the first algorithm for SS-CDIL with theoretical justification. Our method uses only offline data, including a small number of target expert demonstrations and some unlabeled imperfect trajectories. To handle domain discrepancy, we propose a novel cross-domain loss function for learning inter-domain state-action mappings and design an adaptive weight function to balance the source and target knowledge. Experiments on MuJoCo and Robosuite show consistent gains over the baselines, demonstrating that our approach achieves stable and data-efficient policy learning with minimal supervision. Our code is available at~ https://github.com/NYCU-RL-Bandits-Lab/CDIL.
翻译:跨领域模仿学习(CDIL)通过跨领域迁移专家知识来加速策略学习,这在专家数据收集成本高昂的应用中具有重要价值。现有方法要么是监督式的(依赖代理任务和显式对齐),要么是无监督式的(在无配对数据情况下对齐分布),但往往存在不稳定的问题。我们提出了半监督跨领域模仿学习(SS-CDIL)设定,并提出了首个具有理论依据的SS-CDIL算法。我们的方法仅使用离线数据,包括少量目标专家示范和一些未标注的不完美轨迹。为处理领域差异,我们提出了一种新颖的跨领域损失函数来学习跨领域状态-动作映射,并设计了自适应权重函数以平衡源领域与目标领域的知识。在MuJoCo和Robosuite上的实验显示,相较于基线方法,我们的方法取得了持续的性能提升,证明了该方法能以最小监督实现稳定且数据高效的策略学习。我们的代码公开于~ https://github.com/NYCU-RL-Bandits-Lab/CDIL。