Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it 'safe' for an EENN to go 'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.
翻译:扩展机器学习模型能显著提升其性能,但这种增益是以推理速度缓慢和资源密集为代价的。早期退出神经网络(EENNs)提供了一种有前景的解决方案:它们允许中间层提前退出并产生预测,从而加速推理过程。然而,EENNs的一个根本问题在于如何确定退出时机而不严重降低性能。换言之,EENN何时“快速”运行才是“安全”的?为解决这一问题,我们研究了如何将风险控制框架应用于EENNs。风险控制提供了一种无需分布假设、事后调整的解决方案,通过调节EENN的退出机制,确保仅在输出质量足够高时才触发退出。我们在多种视觉和语言任务上实证验证了我们的观点,结果表明风险控制能在保持用户指定性能目标的同时,实现显著的计算资源节省。