The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization. Linear models with LASSO-type regularization achieve this by setting the weights of irrelevant features to zero, effectively identifying and ignoring them. In artificial neural networks, this selective focus can be achieved by pruning the input layer. Given a cost function enhanced with a sparsity-promoting penalty, our proposal selects a regularization term $\lambda$ (without the use of cross-validation or a validation set) that creates a local minimum in the cost function at the origin where no features are selected. This local minimum acts as a baseline, meaning that if there is no strong enough signal to justify a feature inclusion, the local minimum remains at zero with a high prescribed probability. The method is flexible, applying to complex models ranging from shallow to deep artificial neural networks and supporting various cost functions and sparsity-promoting penalties. We empirically show a remarkable phase transition in the probability of retrieving the relevant features, as well as good generalization thanks to the choice of $\lambda$, the non-convex penalty and the optimization scheme developed. This approach can be seen as a form of compressed sensing for complex models, allowing us to distill high-dimensional data into a compact, interpretable subset of meaningful features.
翻译:稀疏学习器的动机在于通过仅选择对良好泛化所需的输入(特征)来压缩输入。采用LASSO型正则化的线性模型通过将无关特征的权重设为零来实现这一目标,从而有效识别并忽略这些特征。在人工神经网络中,这种选择性聚焦可通过剪枝输入层实现。给定一个通过稀疏促进惩罚增强的成本函数,我们的方案选择正则化项$\lambda$(无需使用交叉验证或验证集),该参数在原点处(即未选择任何特征时)创建成本函数的局部最小值。该局部最小值作为基线,意味着若不存在足够强的信号证明应包含某个特征,则局部最小值以高设定概率保持为零。该方法具有灵活性,适用于从浅层到深层人工神经网络的复杂模型,并支持多种成本函数和稀疏促进惩罚。我们通过实验展示了在相关特征检索概率方面显著的相变现象,以及通过$\lambda$的选择、非凸惩罚和所开发的优化方案实现的良好泛化能力。该方法可视为复杂模型的压缩感知形式,使我们能够将高维数据提炼为紧凑、可解释的有意义特征子集。