This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP has been widely applied to many areas, including healthcare, power system, manufacturing and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study an efficient sampling algorithm and develop its approximation bound for MESP, which improves the best-known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. By developing new mathematical tools for the singular matrices and analyzing the Lagrangian dual of the proposed convex integer program, we investigate the widely-used local search algorithm and prove its first-known approximation bound for MESP. The proof techniques further inspire us with an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-sized and large-scale instances to near-optimality. Our proposed algorithms are coded and released as open-source software. Finally, we extend the analyses to the A-Optimal MESP (A-MESP), where the objective is to minimize the trace of the inverse of the selected principal submatrix.
翻译:本文研究了经典的极大熵采样问题(MESP),旨在从协方差矩阵中选取指定大小的最具信息量的主子矩阵。MESP已广泛应用于医疗保健、电力系统、制造与数据科学等多个领域。通过分析其拉格朗日对偶与原始表征,我们推导出MESP的新型凸整数规划模型,并证明其连续松弛可得到近优解。这一结果促使我们研究MESP的高效采样算法并建立其近似界,该界改进了文献中的已知最优结果。随后我们提出了具有相同近似界的采样算法的高效确定性实现。通过发展奇异矩阵的新数学工具并分析所提出凸整数规划的拉格朗日对偶,我们研究了广泛使用的局部搜索算法并证明其MESP的第一个已知近似界。该证明技术进一步启发我们实现高效的局部搜索算法。数值实验表明,这些近似算法能高效求解中等规模及大规模实例至近优水平。所提算法已编码并作为开源软件发布。最后,我们将分析拓展至A-最优MESP(A-MESP),其目标是最小化所选主子矩阵逆的迹。