Model-based reinforcement learning is a powerful tool, but collecting data to fit an accurate model of the system can be costly. Exploring an unknown environment in a sample-efficient manner is hence of great importance. However, the complexity of dynamics and the computational limitations of real systems make this task challenging. In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. Our policy maximizes the information of the next step and results in an adaptive exploration algorithm, compatible with generic parametric learning models and requiring minimal resources. We test our method on a number of nonlinear environments covering different settings, including time-varying dynamics. Keeping in mind that exploration is intended to serve an exploitation objective, we also test our algorithm on downstream model-based classical control tasks and compare it to other state-of-the-art model-based and model-free approaches. The performance achieved by FLEX is competitive and its computational cost is low.
翻译:基于模型的强化学习是一种强大的工具,但收集数据以拟合系统的精确模型可能成本高昂。因此,以样本高效的方式探索未知环境至关重要。然而,动态系统的复杂性以及实际系统的计算限制使得这一任务充满挑战。本文提出FLEX,一种基于最优实验设计的非线性动力学探索算法。我们的策略最大化下一步的信息量,从而形成一种自适应探索算法,该算法兼容通用参数化学习模型,且所需资源极少。我们在多个涵盖不同设置(包括时变动力学)的非线性环境中测试了该方法。考虑到探索最终服务于利用目标,我们还在下游基于模型的经典控制任务上测试了该算法,并将其与其他最先进的基于模型和无模型方法进行了比较。FLEX实现了具有竞争力的性能,且其计算成本较低。