Regularisation in neural networks: a survey and empirical analysis of approaches

from arxiv, 15 pages, 4 figures, 4 tables and for associated to the code, see https://github.com/Christo08/Benchmarks-of-regularisation-techniques.git

Despite huge successes on a wide range of tasks, neural networks are known to sometimes struggle to generalise to unseen data. Many approaches have been proposed over the years to promote the generalisation ability of neural networks, collectively known as regularisation techniques. These are used as common practice under the assumption that any regularisation added to the pipeline would result in a performance improvement. In this study, we investigate whether this assumption holds in practice. First, we provide a broad review of regularisation techniques, including modern theories such as double descent. We propose a taxonomy of methods under four broad categories, namely: (1) data-based strategies, (2) architecture strategies, (3) training strategies, and (4) loss function strategies. Notably, we highlight the contradictions and correspondences between the approaches in these broad classes. Further, we perform an empirical comparison of the various regularisation techniques on classification tasks for ten numerical and image datasets applied to the multi-layer perceptron and convolutional neural network architectures. Results show that the efficacy of regularisation is dataset-dependent. For example, the use of a regularisation term only improved performance on numeric datasets, whereas batch normalisation improved performance on image datasets only. Generalisation is crucial to machine learning; thus, understanding the effects of applying regularisation techniques, and considering the connections between them is essential to the appropriate use of these methods in practice.

翻译：尽管神经网络在众多任务上取得了巨大成功，但其泛化到未见数据的能力有时仍显不足。多年来，研究者提出了许多提升神经网络泛化能力的方法，这些方法被统称为正则化技术。人们通常假设在流程中添加任何正则化都会带来性能提升，并据此将其作为常规实践。本研究旨在检验这一假设在实际中是否成立。首先，我们对正则化技术进行了广泛综述，包括现代理论如双重下降现象。我们提出了一种将方法归纳为四大类别的分类体系，即：(1) 基于数据的策略，(2) 架构策略，(3) 训练策略，以及(4) 损失函数策略。值得注意的是，我们重点阐述了这些大类中不同方法间的矛盾与对应关系。此外，我们在多层感知机和卷积神经网络架构上，针对十个数值型和图像数据集，对各类正则化技术进行了分类任务的实证比较。结果表明，正则化的有效性具有数据集依赖性。例如，添加正则化项仅对数值型数据集产生性能提升，而批归一化则仅对图像数据集有效。泛化能力对机器学习至关重要；因此，理解应用正则化技术的影响，并考量其内在关联，对于在实践中恰当运用这些方法具有重要意义。