We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.
翻译:我们提出了一种针对(结构化)稀疏性显式正则化目标函数平滑优化的框架。这类非光滑且可能非凸的问题通常依赖针对特定模型和正则化器设计的求解器。相比之下,我们的方法实现了完全可微分且无近似的优化,因此与深度学习中广泛使用的梯度下降范式兼容。所提出的优化迁移包括对选定参数的过参数化以及惩罚项的变更。在过参数化问题中,光滑代理正则化能在基础参数化中诱导出非光滑的稀疏正则化。我们证明了代理目标函数具有等价性,不仅具有相同的全局最小值,还匹配局部最小值,从而避免引入伪解。此外,我们的理论针对任意(可能无正则化的)目标函数的局部最小值匹配问题,建立了具有独立意义的结果。我们全面回顾了不同领域中可被通用理论覆盖的稀疏性诱导参数化方法,扩展了其适用范围,并在多个方面提出了改进。数值实验进一步证明了我们方法在从高维回归到稀疏神经网络训练等多个稀疏学习问题上的正确性和有效性。