面向测试时动态模型选择的元认知敏感性 (Metacognitive Sensitivity for Test-Time Dynamic Model Selection)

A key aspect of human cognition is metacognition - the ability to assess one's own knowledge and judgment reliability. While deep learning models can express confidence in their predictions, they often suffer from poor calibration, a cognitive bias where expressed confidence does not reflect true competence. Do models truly know what they know? Drawing from human cognitive science, we propose a new framework for evaluating and leveraging AI metacognition. We introduce meta-d', a psychologically-grounded measure of metacognitive sensitivity, to characterise how reliably a model's confidence predicts its own accuracy. We then use this dynamic sensitivity score as context for a bandit-based arbiter that performs test-time model selection, learning which of several expert models to trust for a given task. Our experiments across multiple datasets and deep learning model combinations (including CNNs and VLMs) demonstrate that this metacognitive approach improves joint-inference accuracy over constituent models. This work provides a novel behavioural account of AI models, recasting ensemble selection as a problem of evaluating both short-term signals (confidence prediction scores) and medium-term traits (metacognitive sensitivity).

翻译：人类认知的一个关键方面是元认知——即评估自身知识及判断可靠性的能力。尽管深度学习模型能够表达对其预测结果的置信度，但它们常存在校准不佳的问题，这是一种认知偏差，表现为表达的置信度未能反映真实能力。模型是否真正了解自身所知？借鉴人类认知科学，我们提出了一种用于评估和利用人工智能元认知的新框架。我们引入了元d'（meta-d'），这是一种基于心理学的元认知敏感性度量，用于刻画模型置信度预测自身准确性的可靠程度。随后，我们将这一动态敏感性分数作为上下文信息，用于基于多臂赌博机（bandit）的仲裁器，该仲裁器在测试时执行模型选择，学习针对给定任务应信任多个专家模型中的哪一个。我们在多个数据集和深度学习模型组合（包括CNN和VLM）上的实验表明，这种元认知方法相较于各组成模型，提升了联合推理的准确性。本研究为AI模型提供了一种新颖的行为描述框架，将集成选择问题重新定义为同时评估短期信号（置信度预测分数）与中期特质（元认知敏感性）的问题。