We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.
翻译:我们提出ContainerGym,这是一个受真实工业资源分配任务启发的强化学习基准。该基准编码了真实世界序贯决策问题中常见的系列挑战(如不确定性)。其可配置性支持实例化不同难度层级的问题(例如在变量维度方面)。与其他旨在模拟真实世界难度的强化学习基准不同,我们的基准直接源于真实工业问题,且仅经过最低限度的简化与精简。该基准具有充分通用性,可评估适用于任何符合我们资源分配框架的真实世界问题的强化学习算法。我们提供了标准基线方法的实验结果。除常规训练奖励曲线外,我们的结果及用于解读结果的统计工具,揭示了著名深度强化学习算法(PPO、TRPO和DQN)的显著局限性。