SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy Adaptation

This work explores the zero-shot adaptation capability of semantic skills, semantically interpretable experts' behavior patterns, in cross-domain settings, where a user input in interleaved multi-modal snippets can prompt a new long-horizon task for different domains. In these cross-domain settings, we present a semantic skill translator framework SemTra which utilizes a set of multi-modal models to extract skills from the snippets, and leverages the reasoning capabilities of a pretrained language model to adapt these extracted skills to the target domain. The framework employs a two-level hierarchy for adaptation: task adaptation and skill adaptation. During task adaptation, seq-to-seq translation by the language model transforms the extracted skills into a semantic skill sequence, which is tailored to fit the cross-domain contexts. Skill adaptation focuses on optimizing each semantic skill for the target domain context, through parametric instantiations that are facilitated by language prompting and contrastive learning-based context inferences. This hierarchical adaptation empowers the framework to not only infer a complex task specification in one-shot from the interleaved multi-modal snippets, but also adapt it to new domains with zero-shot learning abilities. We evaluate our framework with Meta-World, Franka Kitchen, RLBench, and CARLA environments. The results clarify the framework's superiority in performing long-horizon tasks and adapting to different domains, showing its broad applicability in practical use cases, such as cognitive robots interpreting abstract instructions and autonomous vehicles operating under varied configurations.

翻译：摘要：本文探索了语义技能（即语义可解释的专家行为模式）在跨域设置中的零样本适应能力。在跨域场景下，用户以交错的多模态片段形式输入信息，可触发不同域中的全新长时域任务。为此，我们提出语义技能转换框架SemTra，该框架利用一组多模态模型从片段中提取技能，并借助预训练语言模型的推理能力将这些提取的技能适配到目标域。该框架采用两级层次结构进行适应：任务适应和技能适应。在任务适应阶段，语言模型通过序列到序列的翻译将提取的技能转化为语义技能序列，该序列针对跨域上下文进行了定制。技能适应则聚焦于优化每个语义技能在目标域上下文中的表现，通过语言提示和基于对比学习的上下文推断实现参数化实例化。这种分层适应机制使框架不仅能从交错的多模态片段中一次性推断复杂任务规范，还能通过零样本学习能力将其适应至新域。我们在Meta-World、Franka Kitchen、RLBench和CARLA环境中评估了该框架。结果证实了该框架在执行长时域任务及适应不同域方面的优越性，展示了其在认知机器人解析抽象指令、自动驾驶汽车在多配置下运行等实际用例中的广泛适用性。