In this perspective paper, we first comprehensively review existing evaluations of Large Language Models (LLMs) using both standardized tests and ability-oriented benchmarks. We pinpoint several problems with current evaluation methods that tend to overstate the capabilities of LLMs. We then articulate what artificial general intelligence should encompass beyond the capabilities of LLMs. We propose four characteristics of generally intelligent agents: 1) they can perform unlimited tasks; 2) they can generate new tasks within a context; 3) they operate based on a value system that underpins task generation; and 4) they have a world model reflecting reality, which shapes their interaction with the world. Building on this viewpoint, we highlight the missing pieces in artificial general intelligence, that is, the unity of knowing and acting. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. Additionally, knowledge acquisition isn't solely reliant on passive input but requires repeated trials and errors. We conclude by outlining promising future research directions in the field of artificial general intelligence.
翻译:在这篇观点论文中,我们首先全面回顾了现有对大型语言模型的评估方法,包括标准化测试和能力导向基准测试。我们指出现有评估方法中若干易夸大语言模型能力的问题。接着,我们阐述了通用人工智能应涵盖超越语言模型能力的范畴。我们提出通用智能体的四个特征:1)能执行无限任务;2)能在特定情境中生成新任务;3)基于支撑任务生成的价值观系统运作;4)拥有反映现实的世界模型,该模型塑造其与世界的交互方式。基于这一观点,我们强调通用人工智能的缺失拼图——即"知"与"行"的统一。我们认为,主动与真实世界的物体交互能为形成概念表征提供更稳健的信号。此外,知识获取不仅依赖被动输入,更需要反复试错。最后,我们总结了通用人工智能领域未来值得探索的研究方向。