We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.
翻译:本文提出了一种新颖的深度强化学习方法,该方法融入了部分可观测马尔可夫决策过程中价值函数在信念空间上的凸性特性。我们引入了硬约束凸性与软约束凸性两种不同实现路径,并在两个经典POMDP环境(即Tiger问题和FieldVisionRockSample问题)上将其性能与标准深度强化学习方法进行对比。研究结果表明,引入凸性特征能显著提升智能体的性能表现,并增强其在超参数空间上的鲁棒性,在分布外领域测试时效果尤为突出。本工作的源代码已发布于 https://github.com/Dakout/Convex_DRL。