Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems.
翻译:二元优化在组合优化问题(如MaxCut、MIMO检测和MaxSAT)中具有广泛应用。然而,由于二元约束的存在,这些问题通常属于NP难问题。我们提出了一种新颖的概率模型,通过参数化的策略分布对二元解进行采样。具体而言,最小化参数化策略分布与函数值的吉布斯分布之间的KL散度可转化为一个随机优化问题,其策略梯度类似于强化学习可显式推导。为实现离散空间中的相干探索,我们采用并行马尔可夫链蒙特卡洛(MCMC)方法从策略分布中采样,保证多样性并高效近似梯度。进一步地,我们开发了一种滤波方案,将原始目标函数替换为基于局部搜索技术的修正函数,以拓宽函数景观的视野。基于MCMC的集中不等式,我们建立了策略梯度方法在期望意义上收敛至驻点的理论性质。数值结果表明,该框架在多个二元优化问题中能够提供接近最优的解。