We propose an actor-critic algorithm for a family of complex problems arising in algebraic statistics and discrete optimization. The core task is to produce a sample from a finite subset of the non-negative integer lattice defined by a high-dimensional polytope. We translate the problem into a Markov decision process and devise an actor-critic reinforcement learning (RL) algorithm to learn a set of good moves that can be used for sampling. We prove that the actor-critic algorithm converges to an approximately optimal sampling policy. To tackle complexity issues that typically arise in these sampling problems, and to allow the RL to function at scale, our solution strategy takes three steps: decomposing the starting point of the sample, using RL on each induced subproblem, and reconstructing to obtain a sample in the original polytope. In this setup, the proof of convergence applies to each subproblem in the decomposition. We test the method in two regimes. In statistical applications, a high-dimensional polytope arises as the support set for the reference distribution in a model/data fit test for a broad family of statistical models for categorical data. We demonstrate how RL can be used for model fit testing problems for data sets for which traditional MCMC samplers converge too slowly due to problem size and sparsity structure. To test the robustness of the algorithm and explore its generalization properties, we apply it to synthetically generated data of various sizes and sparsity levels.
翻译:我们提出了一种Actor-Critic算法,用于解决代数统计与离散优化领域中出现的一系列复杂问题。其核心任务是从一个由高维多面体定义的非负整数格有限子集中生成样本。我们将该问题转化为马尔可夫决策过程,并设计了一种Actor-Critic强化学习算法,以学习可用于采样的一组有效转移动作。我们证明了该Actor-Critic算法能够收敛至近似最优的采样策略。为应对此类采样问题中常见的复杂性挑战,并使强化学习算法能够适应大规模问题,我们的解决策略分为三步:分解采样起点、对每个诱导子问题应用强化学习,以及通过重构获得原始多面体中的样本。在此框架下,收敛性证明适用于分解中的每个子问题。我们在两种场景下测试了该方法。在统计应用中,高维多面体作为参考分布的支撑集出现,用于范畴数据广泛统计模型的模型/数据拟合检验。我们展示了如何利用强化学习解决传统MCMC采样器因问题规模与稀疏结构导致收敛速度过慢的数据集模型拟合检验问题。为测试算法的鲁棒性并探索其泛化特性,我们将其应用于不同规模与稀疏程度的合成生成数据。