Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.
翻译:线性回归中的最优子集选择问题因其非凸性而众所周知,且随着问题维度的增加,可能的子集数量急剧增长,导致计算求解极具挑战性。因此,对于维度达数千的问题,通过精确优化方法寻找全局最优解可能需要不切实际的CPU时间。这表明了寻找次优程序的重要性,这些程序能以远低于精确方法的计算成本提供良好的近似解。在本研究中,我们提出了一种新程序,并将其与其他流行的次优算法进行比较,以解决最优子集选择问题。我们使用合成数据和真实数据进行了大量计算实验。实验结果揭示了这些方法在不同数据设置下的性能表现。该新程序被证明是解决高维数据最优子集选择问题的一种具有竞争力的次优算法。