Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it 'safe' for an EENN to go 'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.
翻译:扩展机器学习模型能显著提升其性能,但这种增益是以推理速度缓慢和资源密集为代价的。早期退出神经网络提供了一种有前景的解决方案:它们允许中间层提前退出并产生预测,从而加速推理。然而,EENNs的一个根本问题在于如何确定何时退出而不严重降低性能。换言之,EENN何时"快速"运行是"安全"的?为解决此问题,我们研究了如何将风险控制框架适配到EENNs中。风险控制提供了一种无需分布假设、事后调整的解决方案,通过调节EENN的退出机制,使得仅当输出质量足够高时才发生退出。我们在一系列视觉和语言任务上实证验证了我们的见解,证明风险控制能在保持用户指定性能目标的同时,实现显著的计算节省。