This paper presents a framework for smooth optimization of objectives with $\ell_q$ and $\ell_{p,q}$ regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly non-convex problems typically relies on specialized optimization routines. In contrast, the method studied here is compatible with off-the-shelf (stochastic) gradient descent that is ubiquitous in deep learning, thereby enabling differentiable sparse regularization without approximations. The proposed optimization transfer comprises an overparametrization of selected model parameters followed by a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization induces non-smooth and non-convex regularization in the original parametrization. We show that the resulting surrogate problem not only has an identical global optimum but also exactly preserves the local minima. This is particularly useful in non-convex regularization, where finding global solutions is NP-hard and local minima often generalize well. We provide an integrative overview that consolidates various literature strands on sparsity-inducing parametrizations in a general setting and meaningfully extend existing approaches. The feasibility of our approach is evaluated through numerical experiments, demonstrating its effectiveness by matching or outperforming common implementations of convex and non-convex regularizers.
翻译:本文提出一种针对(结构化)稀疏性下 $\ell_q$ 和 $\ell_{p,q}$ 正则化目标函数的光滑优化框架。求解这些非光滑且可能非凸的问题通常需要专门的优化算法。相比之下,本文研究的方法与深度学习领域广泛使用的现成(随机)梯度下降算法兼容,从而无需近似即可实现可微分的稀疏正则化。所提出的优化转移包含对选定模型参数的过参数化及随后的惩罚项变换。在过参数化问题中,光滑且凸的 $\ell_2$ 正则化会在原始参数化中诱导出非光滑且非凸的正则化项。我们证明,由此产生的替代问题不仅具有相同的全局最优解,而且精确保留了局部极小值。这一特性在非凸正则化中尤其有用——全局解求解属于NP难问题,而局部极小值通常具有良好的泛化性能。我们通过整合性综述将不同文献中关于稀疏诱导参数化方法统一于一般框架,并对现有方法进行了有意义的拓展。数值实验验证了所提方法的可行性,表明其在凸与非凸正则化实现中能达到或超越常见实现的性能。