Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.
翻译:元认知是智能的关键组成部分,尤其涉及对自身知识的觉知。虽然人类依赖共享的内部记忆来回答问题并报告其知识状态,但大型语言模型中这种依赖性仍未得到充分探索。本研究提出一个框架,通过双提示方法测量元认知能力 $d_{\rm{type2}}'$,随后引入用于元认知对齐的进化策略,以将模型的内部知识与其显式行为相绑定。该策略在多种未经训练的设定中展现出强大的泛化能力,表明模型参照自身知识的能力得到提升。此外,参数分析表明这些改进可归因于少量关键参数的修改。