Neural ordinary differential equations (ODEs) are widely recognized as the standard for modeling physical mechanisms, which help to perform approximate inference in unknown physical or biological environments. In partially observable (PO) environments, how to infer unseen information from raw observations puzzled the agents. By using a recurrent policy with a compact context, context-based reinforcement learning provides a flexible way to extract unobservable information from historical transitions. To help the agent extract more dynamics-related information, we present a novel ODE-based recurrent model combines with model-free reinforcement learning (RL) framework to solve partially observable Markov decision processes (POMDPs). We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Furthermore, our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
翻译:神经常微分方程被广泛认为是模拟物理机制的标准方法,有助于在未知的物理或生物环境中进行近似推理。在部分可观测环境中,如何从原始观测中推断未知信息一直是智能体面临的难题。通过采用具有紧凑上下文的循环策略,基于上下文的强化学习提供了一种灵活的方法从历史转换中提取不可观测信息。为帮助智能体提取更多动力学相关信息,我们提出了一种新型的基于常微分方程的循环模型,并将其与无模型强化学习框架相结合,用于解决部分可观测马尔可夫决策过程。我们通过多种部分可观测连续控制和元强化学习任务实验证明了该方法的有效性。此外,实验表明,由于常微分方程具有对非均匀采样时间序列建模的能力,我们的方法对不规则观测具有鲁棒性。