During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, aimed at fulfilling the testing needs of large models with enhanced capabilities. The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence. To assess the multidimensional intelligence of large models, the AGI tests consist of a battery of well-designed cognitive tests adopted from human intelligence tests, and then naturally encapsulates into an immersive virtual community. We propose increasing the complexity of AGI testing tasks commensurate with advancements in large models and emphasizing the necessity for the interpretation of test results to avoid false negatives and false positives. We believe that cognitive science-inspired AGI tests will effectively guide the targeted improvement of large models in specific dimensions of intelligence and accelerate the integration of large models into human society.
翻译:在大模型演进过程中,必须对其能力进行性能评估,并在实际应用前确保安全性。然而,当前模型评估主要依赖特定任务和数据集,缺乏评估大模型多维智能的统一框架。基于此视角,我们倡导构建一种受认知科学启发的人工通用智能(AGI)测试综合框架,旨在满足能力不断增强的大模型的测试需求。该认知科学启发的AGI测试涵盖智能的全维度谱系,包括晶体智力、流体智力、社会智力和具身智能。为评估大模型的多维智能,AGI测试采用了一系列源自人类智力测试的精心设计的认知任务,并自然嵌入到一个沉浸式虚拟社区中。我们建议根据大模型的发展进步相应提升AGI测试任务的复杂度,并强调对测试结果进行解读的必要性,以避免假阴性和假阳性。我们相信,受认知科学启发的AGI测试将有效引导大模型在特定智能维度的定向改进,并加速大模型融入人类社会的过程。