Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in the space of value functions. We explore this through a series of case studies focused on structures that arises in many important problems. In several, there is no information loss and value-based methods are as statistically efficient as model based ones. In other closely-related examples, information loss is severe and value-based methods are severely outperformed. A deeper investigation points to the limitations of the representational power as the driver of the inefficiency, as opposed to failure in algorithm design.
翻译:识别基于模型与免模型方法之间的权衡是强化学习中的核心问题。基于价值的方法在计算上具有显著优势,且有时在统计效率上与基于模型的方法相当。然而,聚焦于策略评估这一核心问题,我们证明关于转移动力学的信息可能无法在价值函数空间中加以表征。我们通过一系列案例研究对此进行探讨,这些案例聚焦于许多重要问题中涌现的特定结构。在其中几个案例中,信息未发生损失,因而基于价值的方法在统计效率上等同基于模型的方法。而在其他密切相关的案例中,信息损失严重,导致基于价值的方法性能显著劣化。深入研究表明,效率低下的根源在于表征能力的局限性,而非算法设计上的失败。