Neural network ensembles have been effectively used to improve generalization by combining the predictions of multiple independently trained models. However, the growing scale and complexity of deep neural networks have led to these methods becoming prohibitively expensive and time consuming to implement. Low-cost ensemble methods have become increasingly important as they can alleviate the need to train multiple models from scratch while retaining the generalization benefits that traditional ensemble learning methods afford. This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles, where a collection of child networks are formed by sampling, perturbing, and optimizing subnetworks from a trained parent model. We explore several distinct methodologies for generating child networks and we evaluate their efficacy through a variety of ablation studies and established benchmarks. Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance while minimizing computational cost. Subnetwork Ensembles offer a compelling framework for exploring how we can build better systems by leveraging the unrealized potential of deep neural networks.
翻译:神经网络集成通过结合多个独立训练模型的预测,已被有效用于提升泛化性能。然而,深度神经网络规模和复杂性的日益增长,导致这些方法的实现成本过高且耗时。低成本集成方法因此变得愈发重要,因为它们能在保留传统集成学习方法所提供的泛化优势的同时,减轻从头训练多个模型的需求。本论文提出并形式化了一个用于构建子网络集成的低成本框架,其中通过从一个训练好的父模型中采样、扰动和优化子网络来形成一组子网络。我们探索了几种生成子网络的不同方法,并通过多种消融研究和公认的基准测试评估了它们的有效性。我们的研究结果表明,该方法能在最小化计算成本的同时,显著提高训练效率、参数利用率和泛化性能。子网络集成为探索如何通过挖掘深度神经网络未实现的潜力来构建更好的系统,提供了一个引人注目的框架。