Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications. Yet, most existing pre-trained transformers do not have a principled mechanism for uncertainty propagation through their feature transformation stack. In this work, we propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping. Composing these probabilistic mappings reveals a probability path that mimics the structure of a diffusion process, transporting data mass from the input distribution to the pre-trained feature distribution. This probability path can then be recompiled on a diffusion process with a unified transition model to enable principled propagation of representation uncertainty throughout the pre-trained model's architecture while maintaining its original predictive performance. Empirical results across a variety of vision and language benchmarks demonstrate that our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.
翻译:预训练Transformer模型的不确定性校准对其在风险敏感应用中的可靠部署至关重要。然而,现有的大多数预训练Transformer缺乏通过其特征变换堆栈进行不确定性传播的机制化设计。本研究提出一种受扩散过程启发的Transformer重构方法,将每个特征变换块建模为概率映射。组合这些概率映射可揭示一条概率路径,该路径模拟了扩散过程的结构,将数据质量从输入分布传输至预训练特征分布。此概率路径随后可通过具有统一转移模型的扩散过程进行重编译,从而在保持原始预测性能的同时,实现表征不确定性在预训练模型架构中的机制化传播。在多种视觉与语言基准测试上的实证结果表明,相较于现有不确定性感知Transformer,本方法在校准性能和预测精度方面均表现出优越性。