In operations research (OR), predictive models often encounter out-of-distribution (OOD) scenarios where the data distribution differs from the training data distribution. In recent years, neural networks (NNs) are gaining traction in OR for their exceptional performance in fields such as image classification. However, NNs tend to make confident yet incorrect predictions when confronted with OOD data. Uncertainty estimation offers a solution to overconfident models, communicating when the output should (not) be trusted. Hence, reliable uncertainty quantification in NNs is crucial in the OR domain. Deep ensembles, composed of multiple independent NNs, have emerged as a promising approach, offering not only strong predictive accuracy but also reliable uncertainty estimation. However, their deployment is challenging due to substantial computational demands. Recent fundamental research has proposed more efficient NN ensembles, namely the snapshot, batch, and multi-input multi-output ensemble. This study is the first to provide a comprehensive comparison of a single NN, a deep ensemble, and the three efficient NN ensembles. In addition, we propose a Diversity Quality metric to quantify the ensembles' performance on the in-distribution and OOD sets in one single metric. The OR case study discusses industrial parts classification to identify and manage spare parts, important for timely maintenance of industrial plants. The results highlight the batch ensemble as a cost-effective and competitive alternative to the deep ensemble. It outperforms the deep ensemble in both uncertainty and accuracy while exhibiting a training time speedup of 7x, a test time speedup of 8x, and 9x memory savings.
翻译:在运筹学领域中,预测模型常面临数据分布与训练数据分布不一致的分布外场景。近年来,神经网络因其在图像分类等领域的优异表现而在运筹学中备受关注。然而,当面对分布外数据时,神经网络容易做出自信但错误的预测。不确定性估计为过度自信的模型提供了解决方案,可明确输出结果何时(不)应被信任。因此,在运筹学领域中,神经网络可靠的不确定性量化至关重要。由多个独立神经网络组成的深度集成已成为一种有前景的方法,不仅具有强大的预测准确性,还能提供可靠的不确定性估计。然而,其部署因巨大的计算需求而面临挑战。近年来的基础研究提出了更高效的神经网络集成方法,即快照集成、批量集成和多输入多输出集成。本研究首次对单个神经网络、深度集成以及三种高效神经网络集成进行全面比较。此外,我们提出了多样性质量指标,用于在单一指标中量化集成在分布内和分布外数据集上的性能。该运筹学案例研究讨论了工业零件分类,以识别和管理备件,这对工业设备的及时维护至关重要。结果表明,批量集成是深度集成的一种经济高效且具有竞争力的替代方案。它在不确定性和准确性方面均优于深度集成,同时训练时间加速7倍,测试时间加速8倍,内存节省9倍。