In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
翻译:近年来,将统计学与机器学习领域的技术相结合以获取两者优势的研究日益受到关注。本文通过神经网络实现了用于变量选择的统计技术——拉索回归(lasso)。研究发现,尽管统计方法与其神经网络版本具有相同的目标函数,但两者因优化方式不同而存在差异。具体而言,神经网络版本通常采用单一验证集进行单步优化,而统计方法则基于交叉验证执行两步优化。统计方法更为精细的优化策略能实现更精确的参数估计,尤其在训练集规模较小时表现显著。为此,本文提出了一种模仿统计框架的神经网络标准训练方法改进方案。在构建上述改进方案的过程中,衍生出一种用于识别显著变量的新型优化算法。基于合成数据集与真实数据集的实验结果表明,该新型优化算法在性能上全面超越了先前三种优化方法。