Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating imaginary trajectories from this model to learn a model-free policy and Q-function. Furthermore, we propose a hybrid planning strategy, combining the learned policy and Q-function with the learned model to enhance time efficiency in planning. Through practical demonstrations, we illustrate that our method improves the compromise between sample efficiency, time efficiency, and performance over state-of-the-art methods.
翻译:将强化学习应用于实际场景时,需要在渐进性能、样本效率与推理时间之间进行权衡。本研究展示了如何通过利用系统动力学的部分物理知识来应对这一三重挑战。我们的方法包括:学习物理信息模型以提升样本效率,并基于该模型生成虚拟轨迹来学习无模型策略与Q函数。此外,我们提出一种混合规划策略,将习得的策略、Q函数与学习模型相结合,以提升规划的时间效率。通过实际案例验证,我们证明该方法在样本效率、时间效率与性能之间的权衡上优于当前最先进的方法。