Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. However, standard approaches typically rely on single-model confidence thresholds, which are frequently unreliable due to inherent calibration issues. To address this, we introduce SQUAD (Scalable Quorum Adaptive Decisions), the first inference scheme that integrates early-exit mechanisms with distributed ensemble learning, improving uncertainty estimation while reducing the inference time. Unlike traditional methods that depend on individual confidence scores, SQUAD employs a quorum-based stopping criterion on early-exit learners by collecting intermediate predictions incrementally in order of computational complexity until a consensus is reached and halting the computation at that exit if the consensus is statistically significant. To maximize the efficacy of this voting mechanism, we also introduce QUEST (Quorum Search Technique), a Neural Architecture Search method to select early-exit learners with optimized hierarchical diversity, ensuring learners are complementary at every intermediate layer. This consensus-driven approach yields statistically robust early exits, improving the test accuracy up to 5.95% compared to state-of-the-art dynamic solutions with a comparable computational cost and reducing the inference latency up to 70.60% compared to static ensembles while maintaining a good accuracy.
翻译:早期退出神经网络通过允许在达到足够置信度时进行中间预测,已成为降低推理延迟的流行方法。然而,标准方法通常依赖于单一模型的置信度阈值,由于固有的校准问题,这些阈值往往不可靠。为解决这一问题,我们提出了SQUAD(可扩展法定数自适应决策),这是首个将早期退出机制与分布式集成学习相结合的推理方案,在改善不确定性估计的同时减少了推理时间。与传统方法依赖个体置信度分数不同,SQUAD采用基于法定数的停止准则对早期退出学习器进行决策:按计算复杂度顺序逐步收集中间预测,直至达成共识;若该共识具有统计显著性,则在达到该退出点时停止计算。为了最大化这种投票机制的有效性,我们还引入了QUEST(法定数搜索技术),这是一种神经架构搜索方法,用于选择具有优化层次多样性的早期退出学习器,确保学习器在每一中间层都具有互补性。这种共识驱动的方法产生了统计上稳健的早期退出,在计算成本相当的情况下,与最先进的动态解决方案相比,测试准确率最高可提升5.95%;同时,在保持良好准确率的前提下,与静态集成方法相比,推理延迟最高可降低70.60%。