Evaluating true metacognition in Large Language Models (LLMs) is difficult due to biases and heuristics. This paper presents a framework to measure and enhance LLM metacognition while controlling for these biases. A measurement method using the $d'_{\rm type2}$ metric is established to isolate metacognitive ability. The Evolution Strategy for Metacognitive Alignment (ESMA) is proposed, demonstrating robust generalization across unseen datasets, languages, and newly acquired knowledge. Finally, parameter analysis reveals that these improvements are driven by a sparse set of parameters, offering new pathways for targeted metacognitive optimization.
翻译:评估大型语言模型(LLMs)的真正元认知能力因偏见和启发式方法的存在而困难重重。本文提出一个框架,在控制这些偏差的同时衡量并提升LLM的元认知能力。我们建立了基于$d'_{\rm type2}$指标的测量方法,以隔离元认知能力。提出元认知对齐进化策略(ESMA),证明其在未见数据集、跨语言及新获取知识上的强泛化能力。最后,参数分析表明,这些改进由稀疏参数集驱动,为定向优化元认知提供了新途径。