Fused Lasso was proposed to characterize the sparsity of the coefficients and the sparsity of their successive differences for the linear regression. Due to its wide applications, there are many existing algorithms to solve fused Lasso. However, the computation of this model is time-consuming in high-dimensional data sets. To accelerate the calculation of fused Lasso in high-dimension data sets, we build up the safe feature identification rule by introducing an extra dual variable. With a low computational cost, this rule can eliminate inactive features with zero coefficients and identify adjacent features with same coefficients in the solution. To the best of our knowledge, existing screening rules can not be applied to speed up the computation of fused Lasso and our work is the first one to deal with this problem. To emphasize our rule is a unique result that is capable of identifying adjacent features with same coefficients, we name the result as the safe feature identification rule. Numerical experiments on simulation and real data illustrate the efficiency of the rule, which means this rule can reduce the computational time of fused Lasso. In addition, our rule can be embedded into any efficient algorithm and speed up the computational process of fused Lasso.
翻译:融合套索被提出用于刻画线性回归中系数稀疏性及其逐次差分的稀疏性。由于其广泛的应用,已有多种算法用于求解融合套索。然而,在高维数据集中,该模型的计算十分耗时。为了加速高维数据集中融合套索的计算,我们通过引入一个额外的对偶变量构建了安全特征识别规则。该规则能以较低的计算成本,在解中剔除系数为零的非活跃特征,并识别系数相同的相邻特征。据我们所知,现有的筛选规则无法用于加速融合套索的计算,而我们的工作是首个解决该问题的研究。为强调我们的规则是能够识别系数相同相邻特征的独特成果,我们将该结果命名为安全特征识别规则。在模拟和真实数据上的数值实验验证了该规则的有效性,表明该规则能够减少融合套索的计算时间。此外,我们的规则可嵌入任何高效算法中,从而加速融合套索的计算过程。