The sparse-group lasso performs both variable and group selection, making simultaneous use of the strengths of the lasso and group lasso. It has found widespread use in genetics, a field that regularly involves the analysis of high-dimensional data, due to its sparse-group penalty, which allows it to utilize grouping information. However, the sparse-group lasso can be computationally more expensive than both the lasso and group lasso, due to the added shrinkage complexity, and its additional hyper-parameter that needs tuning. In this paper a novel dual feature reduction method, Dual Feature Reduction (DFR), is presented that uses strong screening rules for the sparse-group lasso and the adaptive sparse-group lasso to reduce their input space before optimization. DFR applies two layers of screening and is based on the dual norms of the sparse-group lasso and adaptive sparse-group lasso. Through synthetic and real numerical studies, it is shown that the proposed feature reduction approach is able to drastically reduce the computational cost in many different scenarios.
翻译:稀疏群组套索能够同时进行变量选择和组选择,综合了套索和群组套索的优势。由于其稀疏群组惩罚项能够利用分组信息,该方法在遗传学等高维数据分析领域得到了广泛应用。然而,由于增加了收缩复杂性以及需要调节额外的超参数,稀疏群组套索的计算成本可能高于套索和群组套索。本文提出了一种新颖的双重特征约简方法——双重特征约简,该方法在优化前使用稀疏群组套索和自适应稀疏群组套索的强筛选规则来缩减输入空间。DFR 应用双层筛选机制,其理论基础是稀疏群组套索和自适应稀疏群组套索的对偶范数。通过合成数据与真实数据的数值研究表明,所提出的特征约简方法能够在多种不同场景下显著降低计算成本。