Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems. However, the complexity of exploring vast policy spaces can lead to significant inefficiencies. Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process. This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness. Our research focuses on determining the necessary sample size to learn this compressed set accurately. We employ R\'enyi divergence to measure the similarity between true and estimated policy distributions, establishing error bounds for good approximations. To simplify the analysis, we employ the $l_1$ norm, determining sample size requirements for both model-based and model-free settings. Finally, we correlate the error bounds from the $l_1$ norm with those from R\'enyi divergence, distinguishing between policies near the vertices and those in the middle of the policy space, to determine the lower and upper bounds for the required sample sizes.
翻译:策略搜索方法是强化学习中的关键技术,为处理连续状态-动作空间和部分可观测问题提供了框架。然而,探索庞大策略空间的复杂性可能导致显著的效率低下。通过策略压缩来缩减策略空间,成为一种无需奖励即可加速学习过程的有效方法。该技术将策略空间压缩为更小且具有代表性的集合,同时保持原始策略空间的大部分有效性。本研究聚焦于确定准确学习该压缩集合所需的样本量。我们采用Rényi散度来衡量真实策略分布与估计策略分布之间的相似性,并为良好近似建立误差界。为简化分析,我们采用$l_1$范数,分别确定基于模型和无模型设置下的样本量需求。最后,我们将$l_1$范数导出的误差界与Rényi散度导出的误差界相关联,区分策略空间中靠近顶点和位于中部的策略,从而确定所需样本量的下界和上界。