We present a new optimization method for the group selection problem in linear regression. In this problem, predictors are assumed to have a natural group structure and the goal is to select a small set of groups that best fits the response. The incorporation of group structure in a predictor matrix is a key factor in obtaining better estimators and identifying associations between response and predictors. Such a discrete constrained problem is well-known to be hard, particularly in high-dimensional settings where the number of predictors is much larger than the number of observations. We propose to tackle this problem by framing the underlying discrete binary constrained problem into an unconstrained continuous optimization problem. The performance of our proposed approach is compared to state-of-the-art variable selection strategies on simulated data sets. We illustrate the effectiveness of our approach on a genetic dataset to identify grouping of markers across chromosomes.
翻译:我们提出了一种新的线性回归中组选择问题的优化方法。在该问题中,预测变量假定具有自然的组结构,目标是选择一组最拟合响应变量的最优变量组。将组结构引入预测变量矩阵是获得更优估计量以及识别响应变量与预测变量关联的关键因素。此类离散约束问题众所周知难以求解,特别是在高维场景中(预测变量数量远大于观测样本数)。我们提出通过将底层离散二元约束问题转化为无约束连续优化问题来处理该问题。在模拟数据集上,我们将所提方法的性能与最先进的变量选择策略进行了比较。我们通过一个遗传数据集验证了该方法在识别跨染色体标记分组方面的有效性。