Uncertainty quantification (UQ) is essential for deploying deep neural networks in safety-critical settings. Although methods like Deep Ensembles achieve strong UQ performance, their high computational and memory costs hinder scalability to large models. We introduce Hydra Ensembles, an efficient transformer-based ensemble that prunes attention heads to create diverse members and merges them via a new multi-head attention with grouped fully-connected layers. This yields a compact model with inference speed close to a single network, matching or surpassing Deep Ensembles in UQ performance without retraining from scratch. We also provide an in-depth analysis of pruning, showing that naive approaches can harm calibration, whereas Hydra Ensembles preserves robust uncertainty. Experiments on image and text classification tasks, with various architectures, show consistent gains over Deep Ensembles. Remarkably, in zero-shot classification on ImageNet-1k, our approach surpasses state of the art methods, even without requiring additional training.
翻译:不确定性量化(UQ)对于在安全关键场景中部署深度神经网络至关重要。尽管诸如深度集成等方法能够实现强大的UQ性能,但其高昂的计算和内存成本阻碍了其向大型模型的可扩展性。我们提出了Hydra集成,这是一种基于Transformer的高效集成方法,它通过剪枝注意力头来创建多样化的成员,并通过一种采用分组全连接层的新型多头注意力机制将它们合并。这产生了一个推理速度接近单一网络的紧凑模型,在不从头开始重新训练的情况下,其UQ性能匹配甚至超越了深度集成。我们还对剪枝进行了深入分析,表明简单的方法可能会损害校准性能,而Hydra集成则能保持稳健的不确定性。在图像和文本分类任务上,使用多种架构进行的实验表明,该方法相对于深度集成取得了持续的性能提升。值得注意的是,在ImageNet-1k的零样本分类任务中,我们的方法超越了现有最先进方法,甚至无需额外的训练。