Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.
翻译:基于Transformer的科学基础模型日益部署于高风险场景,但当前架构输出确定性结果,对校准预测不确定性的支持有限。我们提出随机注意力机制——一种轻量级推断时改进方法,通过用由单一浓度参数控制的多项归一化样本替代softmax权重来随机化注意力,无需重新训练即可生成预测集成。为设置该参数,我们引入匹配随机注意力输出与目标的校准目标函数,从而形成高效的单变量后调优问题。我们在两个面向天气与时序预测的科学基础模型及另一个回归任务上评估该机制。与考虑不确定性的基线方法相比,我们发现随机注意力在可比覆盖率下实现了最强的原生校准与最窄的预测区间,且仅需数分钟后调优,而竞争基线需数日重新训练。