Solving a global optimal problem requires only two-armed slot machine

For a general purpose optimization problem over a finite rectangle region, this paper pioneers a unified slot machine framework for global optimization by transforming the search for global optimizer(s) to the optimal strategy formulation of a bandit process in infinite policy sets and proves that two-armed bandit is enough. By leveraging the strategic bandit process-driven optimization framework, we introduce a new {\bf S}trategic {\bf M}onte {\bf C}arlo {\bf O}ptimization (SMCO) algorithm that coordinate-wisely generates points from multiple paired distributions and can be implemented parallel for high-dimensional continuous functions. Our SMCO algorithm, equipped with tree search that broadens the optimal policy search space of slot machine for attaining the global optimizer(s) of a multi-modal function, facilitates fast learning via trial and error. We provide a strategic law of large numbers for nonlinear expectations in bandit settings, and establish that our SMCO algorithm converges to global optimizer(s) almost surely. Unlike the standard gradient descent ascent (GDA) that uses a one-leg walk to climb the mountain and is sensitive to starting points and step sizes, our SMCO algorithm takes a two-leg walk to the peak by using the two-sided sampling from the paired distributions and is not sensitive to initial point selection or step size constraints. Numerical studies demonstrate that the new SMCO algorithm outperforms GDA, particle swarm optimization and simulated annealing in both convergence accuracy and speed. Our SMCO algorithm should be extremely useful for finding optimal tuning parameters in many large scale complex optimization problems.

翻译：针对有限矩形区域上的一般目的优化问题，本文开创性地提出了一种用于全局优化的统一老虎机框架，通过将全局最优解的搜索转化为无限策略集中老虎机过程最优策略的构建，并证明双臂老虎机已足够。借助策略驱动的老虎机过程优化框架，我们提出了一种新的策略蒙特卡洛优化算法，该算法通过坐标方向从多组配对分布中生成采样点，并可并行实现以处理高维连续函数。我们的SMCO算法配备了树搜索机制，能够拓宽老虎机的最优策略搜索空间以获取多峰函数的全局最优解，通过试错机制实现快速学习。我们建立了老虎机设定下非线性期望的策略大数定律，并证明SMCO算法几乎必然收敛至全局最优解。与采用单腿行走方式爬山且对初始点及步长敏感的标准梯度下降上升算法不同，我们的SMCO算法通过配对分布的双侧采样实现双腿行走式寻峰，对初始点选择和步长约束均不敏感。数值研究表明，新SMCO算法在收敛精度与速度上均优于梯度下降上升算法、粒子群优化和模拟退火算法。该算法对于众多大规模复杂优化问题中的最优调参具有重要应用价值。