Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.

翻译：强化学习在处理高维空间中的最优控制探索时，常需应对状态与动作的指数级增长（即“维度灾难”）。本文通过学习动作相似马尔可夫决策过程的内在结构，在性能损失与样本/计算复杂度之间进行恰当权衡，以解决该问题。具体而言，我们依据转移分布与奖励函数的相似性将动作空间划分为若干组，并建立线性分解模型以捕捉组内转移核与组内奖励的差异。理论分析与实验均揭示了一个“令人惊讶且反直觉的结果”：更精细的分组策略虽能降低将同组动作视为相同所引发的近似误差，但在样本规模或计算资源受限时，反而会导致估计误差增大。这一发现凸显了分组策略可作为优化整体性能损失的新自由度。为此，我们提出一个通用优化问题以确定最优分组策略，在性能损失与样本/计算复杂度之间寻求平衡。进一步，我们设计了一种计算高效的方法来选取近似最优分组策略，其计算复杂度与动作空间规模无关。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/