Large discrete action spaces remain a central challenge for reinforcement learning methods. Such spaces are encountered in many real-world applications, e.g., recommender systems, multi-step planning, and inventory replenishment. The mapping of continuous proxies to discrete actions is a promising paradigm for handling large discrete action spaces. Existing continuous-to-discrete mapping approaches involve searching for discrete neighboring actions in a static pre-defined neighborhood, which requires discrete neighbor lookups across the entire action space. Hence, scalability issues persist. To mitigate this drawback, we propose a novel Dynamic Neighborhood Construction (DNC) method, which dynamically constructs a discrete neighborhood to map the continuous proxy, thus efficiently exploiting the underlying action space. We demonstrate the robustness of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across three different environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being more computationally efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.
翻译:大规模离散动作空间仍是强化学习方法面临的核心挑战。此类空间广泛存在于诸多实际应用中,如推荐系统、多步规划及库存补货。将连续代理映射至离散动作是处理大规模离散动作空间的一种有效范式。现有连续到离散的映射方法涉及在静态预定义邻域中搜索离散相邻动作,这要求对整个动作空间进行离散邻域查找,因此可扩展性问题依然存在。为缓解这一缺陷,我们提出了一种新颖的动态邻域构建(DNC)方法,该方法通过动态构建离散邻域来映射连续代理,从而高效利用底层动作空间。我们通过将本方法与三种专为大规模离散动作空间设计的最新方法在三个不同环境中进行基准测试,证明了其鲁棒性。结果表明,DNC在匹配或超越现有先进方法的同时,计算效率更高。此外,本方法可扩展至现有方法论仍无法计算处理的动作空间。