A unified fused Lasso approach for sparse and blocky feature selection in regression and classification

In many applications, sparse and blocky coefficients often occur in regression and classification problems. The fused Lasso was designed to recover these sparse structured features especially when the design matrix encounters the situation of ultrahigh dimension. Quantile loss is well known as a robust loss function in regression and classification. In this paper, we combine quantile loss and fused Lasso penalty together to produce quantile fused Lasso which can achieve sparse and blocky feature selection in both regression and classification. Interestingly, our proposed model has the unified optimization formula for regression and classification. For ultrahigh dimensional collected data, we derive multi-block linearized alternating direction method of multipliers (LADMM) to deal with it. Moreover, we prove convergence and derive convergence rates of the proposed LADMM algorithm through an elegant method. Note that the algorithm can be easily extended to solve many existing fused Lasso models. Finally, we present some numerical results for several synthetic and real world examples, which illustrate the robustness, scalability, and accuracy of the proposed method.

翻译：在许多应用中，回归和分类问题中常出现稀疏且块状的系数。融合Lasso旨在恢复这些稀疏结构化特征，尤其是在设计矩阵面临超高维情形时。分位数损失作为回归和分类中的稳健损失函数广为人知。本文通过结合分位数损失与融合Lasso惩罚项，提出分位数融合Lasso方法，可在回归与分类中同时实现稀疏与块状特征选择。有趣的是，我们提出的模型具有回归与分类的统一优化公式。针对超高维采集数据，我们推导了多块线性化交替方向乘子法（LADMM）进行处理。此外，通过一种精巧的方法，我们证明了所提LADMM算法的收敛性并推导了其收敛速率。该算法可便捷地扩展至求解现有多种融合Lasso模型。最后，我们通过合成数据与真实世界实例的数值结果，展示了所提方法的稳健性、可扩展性与准确性。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日