Large pretrained models, coupled with fine-tuning, are slowly becoming established as the dominant architecture in machine learning. Even though these models offer impressive performance, their practical application is often limited by the prohibitive amount of resources required for every inference. Early-exiting dynamic neural networks (EDNN) circumvent this issue by allowing a model to make some of its predictions from intermediate layers (i.e., early-exit). Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. As a result, most existing approaches rely on thresholding confidence metrics for the gating mechanism and strive to improve the underlying backbone network and the inference modules. Although successful, this approach has two fundamental shortcomings: 1) the GMs and the IMs are decoupled during training, leading to a train-test mismatch; and 2) the thresholding gating mechanism introduces a positive bias into the predictive probabilities, making it difficult to readily extract uncertainty information. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.
翻译:大规模预训练模型结合微调正逐渐成为机器学习领域的主流架构。尽管这些模型展现出卓越性能,但其实际应用常受制于每次推理所需的高昂资源成本。早期退出动态神经网络通过允许模型从中间层进行部分预测(即早期退出)来规避这一问题。训练此类网络具有挑战性,因其由两个相互交织的组件构成:控制退出决策的门控机制与基于中间表示进行推理的中间推理模块。现有方法大多依赖基于置信度阈值的门控机制,并致力于优化主干网络与推理模块。然而,该方法存在两个根本性缺陷:1)训练过程中门控机制与推理模块相互解耦,导致训练-测试不匹配;2)基于阈值的门控机制会引入预测概率的正向偏差,使得不确定性信息难以直接提取。本文提出一种新型架构,将这两个模块有机联结,在分类数据集上实现了显著的性能提升,并展现出更优的不确定性表征能力。