Recent advances in reinforcement learning (RL) have shown much promise across a variety of applications. However, issues such as scalability, explainability, and Markovian assumptions limit its applicability in certain domains. We observe that many of these shortcomings emanate from the simulator as opposed to the RL training algorithms themselves. As such, we propose a semantic proxy for simulation based on a temporal extension to annotated logic. In comparison with two high-fidelity simulators, we show up to three orders of magnitude speed-up while preserving the quality of policy learned. In addition, we show the ability to model and leverage non-Markovian dynamics and instantaneous actions while providing an explainable trace describing the outcomes of the agent actions.
翻译:近期强化学习(RL)领域的进展在众多应用场景中展现出巨大潜力。然而,可扩展性、可解释性及马尔可夫假设等问题限制了其在特定领域的适用性。我们注意到,这些缺陷大多源于仿真器本身,而非RL训练算法。为此,我们提出一种基于注释逻辑时间扩展的语义仿真代理。与两个高保真仿真器的对比表明,该方法在保持策略学习质量的前提下,可实现高达三个数量级的加速。此外,我们展示了该方法能够建模并利用非马尔可夫动力学与瞬时动作,同时提供描述智能体行为结果的可解释轨迹。