This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF mimicking exam, a situational judgment test related to coaching competencies. Using a mixed method approach, we assessed the metacognitive performance, including sensitivity, accuracy in probabilistic predictions, and bias, of human participants and five advanced LLMs (GPT-4, Claude-3-Opus 3, Mistral Large, Llama 3, and Gemini 1.5 Pro). The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans. However, both LLMs and humans showed less adaptability in ambiguous scenarios, adhering closely to predefined decision frameworks. The study suggests that Generative AI can effectively engage in human-like metacognitive processing without conscious awareness. Implications of the study are discussed in relation to development of AI simulators that scaffold cognitive and metacognitive aspects of mastering coaching competencies. More broadly, implications of these results are discussed in relation to development of metacognitive modules that lead towards more autonomous and intuitive AI systems.
翻译:本研究探讨了大语言模型在国际教练联合会(ICF)模拟考试(一项与教练能力相关的情境判断测试)背景下,相对于人类元认知所具备的元认知能力。采用混合方法,我们评估了人类参与者与五种先进大语言模型(GPT-4、Claude-3-Opus 3、Mistral Large、Llama 3和Gemini 1.5 Pro)的元认知表现,包括敏感性、概率预测准确性及偏差。结果表明,在所有元认知指标上,大语言模型均优于人类,尤其是在降低过度自信方面表现突出。然而,与人类相似,大语言模型在模糊情境下适应性较弱,倾向于严格遵循预定义决策框架。研究表明,生成式AI能在无意识状态下有效模拟人类式元认知处理。本文讨论了该研究对开发用于支撑教练能力掌握中认知与元认知维度的AI模拟器的启示。更广泛而言,这些结果对开发能构建更自主、更直观AI系统的元认知模块具有指导意义。