The Lasso has become a benchmark data analysis procedure, and numerous variants have been proposed in the literature. Although the Lasso formulations are stated so that overall prediction error is optimized, no full control over the accuracy prediction on certain individuals of interest is allowed. In this work we propose a novel version of the Lasso in which quadratic performance constraints are added to Lasso-based objective functions, in such a way that threshold values are set to bound the prediction errors in the different groups of interest (not necessarily disjoint). As a result, a constrained sparse regression model is defined by a nonlinear optimization problem. This cost-sensitive constrained Lasso has a direct application in heterogeneous samples where data are collected from distinct sources, as it is standard in many biomedical contexts. Both theoretical properties and empirical studies concerning the new method are explored in this paper. In addition, two illustrations of the method on biomedical and sociological contexts are considered.
翻译:Lasso已成为数据分析的基准方法,文献中已提出众多变体。尽管Lasso公式的设定旨在优化整体预测误差,但无法完全控制对特定关注个体的预测准确性。本研究提出一种新型Lasso版本,其中在基于Lasso的目标函数中添加了二次性能约束,通过设定阈值来限制不同关注组(不必然互斥)的预测误差。由此,约束稀疏回归模型通过非线性优化问题定义。这种代价敏感的约束Lasso方法在异质性样本(数据来源于不同来源,这在生物医学领域较为常见)中具有直接应用价值。本文探讨了该方法的理论性质与实证研究,并在生物医学和社会学情境下进行了两项应用案例展示。