The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics.
翻译:人工智能系统要真正实现可信可靠,必须具备对其知识与推理中不可避免的不确定性的认知能力。本文针对视觉-语言AI系统提出了一种不确定性分类法,区分了认知不确定性(源于信息缺失)与偶然不确定性(源于内在不可预测性),并进一步探讨了更精细的类别划分。基于该分类法,我们构建了一个包含17.8万个视觉问答(VQA)对比样本的基准数据集CertainlyUncertain,其构建方法包括:1)通过图像修复技术将原本可回答的问题转化为不可回答问题;2)利用图像描述提示大语言模型生成可回答与不可回答两类问题。此外,针对现有度量指标的不足,我们提出了一种与准确率和校准误差均具有良好相关性的新指标——置信度加权准确率。