A key aspect of human cognition is metacognition - the ability to assess one's own knowledge and judgment reliability. While deep learning models can express confidence in their predictions, they often suffer from poor calibration, a cognitive bias where expressed confidence does not reflect true competence. Do models truly know what they know? Drawing from human cognitive science, we propose a new framework for evaluating and leveraging AI metacognition. We introduce meta-d', a psychologically-grounded measure of metacognitive sensitivity, to characterise how reliably a model's confidence predicts its own accuracy. We then use this dynamic sensitivity score as context for a bandit-based arbiter that performs test-time model selection, learning which of several expert models to trust for a given task. Our experiments across multiple datasets and deep learning model combinations (including CNNs and VLMs) demonstrate that this metacognitive approach improves joint-inference accuracy over constituent models. This work provides a novel behavioural account of AI models, recasting ensemble selection as a problem of evaluating both short-term signals (confidence prediction scores) and medium-term traits (metacognitive sensitivity).
翻译:人类认知的一个关键方面是元认知——即评估自身知识及判断可靠性的能力。尽管深度学习模型能够表达对其预测结果的置信度,但它们常存在校准不佳的问题,这是一种认知偏差,表现为表达的置信度未能反映真实能力。模型是否真正了解自身所知?借鉴人类认知科学,我们提出了一种用于评估和利用人工智能元认知的新框架。我们引入了元d'(meta-d'),这是一种基于心理学的元认知敏感性度量,用于刻画模型置信度预测自身准确性的可靠程度。随后,我们将这一动态敏感性分数作为上下文信息,用于基于多臂赌博机(bandit)的仲裁器,该仲裁器在测试时执行模型选择,学习针对给定任务应信任多个专家模型中的哪一个。我们在多个数据集和深度学习模型组合(包括CNN和VLM)上的实验表明,这种元认知方法相较于各组成模型,提升了联合推理的准确性。本研究为AI模型提供了一种新颖的行为描述框架,将集成选择问题重新定义为同时评估短期信号(置信度预测分数)与中期特质(元认知敏感性)的问题。