Recent advances in vision-language-action (VLA) models for robotics have highlighted the importance of reliable uncertainty quantification in sequential tasks. However, assessing and improving calibration in such settings remains mostly unexplored, especially when only partial trajectories are observed. In this work, we formulate sequential calibration for episodic tasks, where task-success confidence is produced along an episode, while success is determined at the end of it. We introduce a sequential extension of the Brier score and show that, for binary outcomes, its risk minimizer coincides with the VLA policy's value function. This connection bridges uncertainty calibration and reinforcement learning, enabling the use of temporal-difference (TD) value estimation as a principled calibration mechanism over time. We empirically show that TD calibration improves performance relative to the state-of-the-art on simulated and real-robot data. Interestingly, we show that when calibrated using TD, the VLA's single-step action probabilities can yield competitive uncertainty estimates, in contrast to recent findings that employed different calibration techniques.
翻译:近年来,面向机器人任务的视觉-语言-动作(VLA)模型在序列任务中不确定性量化的可靠性方面展现了重要意义。然而,在此类场景下校准的评估与改进仍鲜有研究,尤其是当仅观测到部分轨迹时。本文针对情景型任务提出序列校准框架:模型在任务进行中逐步骤生成成功置信度,而最终结果仅在任务结束时确定。我们引入Brier分数的序列化扩展形式,并证明对于二分类结果,其风险最小化函数恰好对应于VLA策略的价值函数。这一联系将不确定性校准与强化学习相衔接,使得时序差分(TD)价值估计可作为一种基于时间维度的原理性校准机制。实验表明,在模拟数据和真实机器人数据上,TD校准相比现有最优方法能显著提升性能。值得注意的是,与传统校准方法的研究结论不同,我们发现基于TD方法校准后,VLA模型的单步动作概率可产生具有竞争力的不确定性估计。