This paper proposes a new algorithm, referred to as GMAB, that combines concepts from the reinforcement learning domain of multi-armed bandits and random search strategies from the domain of genetic algorithms to solve discrete stochastic optimization problems via simulation. In particular, the focus is on noisy large-scale problems, which often involve a multitude of dimensions as well as multiple local optima. Our aim is to combine the property of multi-armed bandits to cope with volatile simulation observations with the ability of genetic algorithms to handle high-dimensional solution spaces accompanied by an enormous number of feasible solutions. For this purpose, a multi-armed bandit framework serves as a foundation, where each observed simulation is incorporated into the memory of GMAB. Based on this memory, genetic operators guide the search, as they provide powerful tools for exploration as well as exploitation. The empirical results demonstrate that GMAB achieves superior performance compared to benchmark algorithms from the literature in a large variety of test problems. In all experiments, GMAB required considerably fewer simulations to achieve similar or (far) better solutions than those generated by existing methods. At the same time, GMAB's overhead with regard to the required runtime is extremely small due to the suggested tree-based implementation of its memory. Furthermore, we prove its convergence to the set of global optima as the simulation effort goes to infinity.
翻译:本文提出一种名为GMAB的新算法,该算法融合了多臂赌博机领域的强化学习概念与遗传算法领域的随机搜索策略,用于求解离散随机优化问题的仿真。研究的重点在于大规模含噪问题,此类问题通常涉及多重维度及多个局部最优解。我们的目标是将多臂赌博机处理不稳定仿真观测的特性,与遗传算法应对包含海量可行解的高维解空间的能力相结合。为此,以多臂赌博机框架为基础,每次仿真观测均被纳入GMAB的记忆库中。遗传算子作为探索与利用的强大工具,基于该记忆库引导搜索过程。实证结果表明,在多种测试问题中,GMAB相比文献中的基准算法具有更优性能。在所有实验中,GMAB取得与现有方法相当或(远)更优解所需的仿真次数显著更少。同时,由于采用基于树结构的记忆实现方案,GMAB在运行时间上的额外开销极低。此外,我们证明了当仿真次数趋于无穷时,该算法必然收敛至全局最优解集。