We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior. Many existing distribution-matching approaches struggle in this regime because they impose strict support constraints and rely on brittle one-step models, making it hard to extract useful signal from imperfect data. To tackle this challenge, we propose TGE, a trajectory-level generative embedding for offline LfO that constructs a dense, smooth surrogate reward by estimating expert state density in the latent space of a temporal diffusion model trained on offline trajectory data. By leveraging the smooth geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap between disjoint supports, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.
翻译:本文研究离线观测模仿学习问题,其中专家示范数据稀缺且可用的离线次优数据与专家行为存在显著分布差异。现有多数分布匹配方法在此场景下面临挑战,因其施加严格支撑集约束并依赖脆弱的一步预测模型,难以从非完美数据中提取有效信号。为应对该挑战,我们提出TGE——一种面向离线LfO的轨迹级生成嵌入方法,通过在时序扩散模型的隐空间内估计专家状态密度,构建稠密平滑的代理奖励函数。该方法利用所学扩散嵌入的平滑几何特性,捕捉长程时序动态特征,有效弥合非重叠支撑集间的差距,确保即使在离线数据与专家分布差异显著时仍能提供稳健的学习信号。实验表明,所提方法在D4RL运动控制与操作基准测试中持续达到或超越现有离线LfO方法的性能。