Off-dynamics offline reinforcement learning (RL) aims to learn a policy for a target domain using limited target data and abundant source data collected under different transition dynamics. Existing methods typically address dynamics mismatch either globally over the state space or via pointwise data filtering; these approaches can miss localized cross-domain similarities or incur high computational cost. We propose Localized Dynamics-Aware Domain Adaptation (LoDADA), which exploits localized dynamics mismatch to better reuse source data. LoDADA clusters transitions from source and target datasets and estimates cluster-level dynamics discrepancy via domain discrimination. Source transitions from clusters with small discrepancy are retained, while those from clusters with large discrepancy are filtered out. This yields a fine-grained and scalable data selection strategy that avoids overly coarse global assumptions and expensive per-sample filtering. We provide theoretical insights and extensive experiments across environments with diverse global and local dynamics shifts. Results show that LoDADA consistently outperforms state-of-the-art off-dynamics offline RL methods by better leveraging localized distribution mismatch.
翻译:离动态离线强化学习(RL)旨在利用有限的目标域数据和在不同转移动态下收集的丰富源域数据,学习目标域的策略。现有方法通常通过在整个状态空间上进行全局处理或通过逐点数据过滤来解决动态不匹配问题;这些方法可能忽略局部跨域相似性或带来高昂的计算成本。我们提出局部动态感知域适应(LoDADA),该方法利用局部动态不匹配来更好地重用源域数据。LoDADA对源域和目标数据集中的转移进行聚类,并通过域判别估计簇级动态差异。来自差异较小簇的源转移被保留,而来自差异较大簇的源转移则被过滤掉。这产生了一种细粒度且可扩展的数据选择策略,避免了过于粗略的全局假设和昂贵的逐样本过滤。我们提供了理论见解,并在具有不同全局和局部动态变化的环境中进行广泛实验。结果表明,通过更好地利用局部分布不匹配,LoDADA在性能上持续优于最先进的离动态离线RL方法。