We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent conditioning. We demonstrate the importance of these modifications in various domains, ranging from discrete Gridworld environments to continuous-control and simulated robot assistive tasks, demonstrating that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns.
翻译:我们提出了DynaMITE-RL,这是一种元强化学习方法,用于在潜在状态以不同速率演化的环境中进行近似推理。我们对片段会话——即潜在状态保持固定的片段部分——进行建模,并对现有元强化学习方法提出了三项关键改进:会话内潜在信息的一致性、会话掩码以及先验潜在条件化。我们在从离散Gridworld环境到连续控制及模拟机器人辅助任务等多种领域中验证了这些改进的重要性,结果表明DynaMITE-RL在样本效率和推理回报方面显著优于现有最先进的基线方法。