Performance of $\ell_1$ Regularization for Sparse Convex Optimization

Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.

翻译：尽管在实践中被广泛采用，但在统计问题之外的场景中，LASSO和Group LASSO的保证显著缺乏，这些算法在确定性输入的稀疏凸优化背景下通常被视为启发式方法。我们首次给出了Group LASSO在具有向量值特征的稀疏凸优化中的恢复保证。我们证明，若在对严格凸函数$l$进行最小化时施加足够大的Group LASSO正则化，则极小化器是一个稀疏向量，其支撑集位于梯度$\ell_2$范数最大的向量值特征上。因此，重复此过程会选择与正交匹配追踪算法相同的特征集，后者通过弱子模性论证，对任何具有受限强凸性和光滑性的函数$l$均具有恢复保证。这回答了Tibshirani等人和Yasuda等人提出的开放问题。我们的结果首次从理论上解释了Group LASSO在一般输入实例下对凸函数的经验成功性，仅假设受限强凸性和光滑性。该结果还推广了Yasuda等人提出的受注意力机制启发的特征选择算法——序列注意力算法的可证明保证。作为应用，我们给出了列子集选择问题的新结果，该问题在损失为弗罗贝尼乌斯范数或其他逐元素矩阵损失时已被充分研究。我们首次针对该问题的一般损失函数给出了结果，仅需满足受限强凸性和光滑性。