In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout.
翻译:本文提出了一种针对多变量回归问题的新型正则化方法。我们的正则化器DLoss能够惩罚模型导数与从训练数据中估计的数据生成函数导数之间的差异。我们将这些估计导数称为数据导数。该方法的目标是使模型与数据对齐,不仅关注目标值的拟合,还涉及相关导数的匹配。为估计数据导数,我们通过最近邻或随机选择策略从训练数据中选取输入值二元组。通过在合成数据集和真实数据集上评估不同权重下将DLoss加入标准均方误差损失的效果,实验结果表明:与无正则化、L2正则化和Dropout相比,采用最近邻选择的DLoss方法在验证集上获得了平均最佳的MSE性能排名。