Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To choose hyperparameters that control the sparsity level and amount of regularization, practitioners commonly use k-fold cross-validation. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To address this computational burden, we derive computationally tractable relaxations of the k-fold cross-validation loss, facilitating hyperparameter selection while solving $50$--$80\%$ fewer MIOs in practice. Our computational results demonstrate, across eleven real-world UCI datasets, that exact MIO-based cross-validation can be competitive with mature software packages such as glmnet and L0Learn -particularly when the sample-to-feature ratio is small.
翻译:给定高维协变量矩阵和响应向量,岭正则化稀疏线性回归以可解释的方式选择能够解释协变量与响应之间关系的特征子集。为选择控制稀疏度水平和正则化强度的超参数,实践者通常采用k折交叉验证。然而,交叉验证会显著增加稀疏回归的计算成本,因为每个超参数组合都需要求解多个混合整数优化问题。为应对这一计算负担,我们推导出k折交叉验证损失的计算可处理松弛形式,从而在减少$50$--$80\%$ MIO求解量的同时实现超参数选择。我们在十一个真实世界UCI数据集上的计算结果表明,基于精确MIO的交叉验证能够与成熟软件包(如glmnet和L0Learn)相竞争——尤其在样本-特征比较小时表现更为突出。