Evaluation of potential AGI systems and methods is difficult due to the breadth of the engineering goal. We have no methods for perfect evaluation of the end state, and instead measure performance on small tests designed to provide directional indication that we are approaching AGI. In this work we argue that AGI evaluation methods have been dominated by a design philosophy that uses our intuitions of what intelligence is to create synthetic tasks, that have performed poorly in the history of AI. Instead we argue for an alternative design philosophy focused on evaluating robust task execution that seeks to demonstrate AGI through competence. This perspective is developed from common practices in data science that are used to show that a system can be reliably deployed. We provide practical examples of what this would mean for AGI evaluation.
翻译:由于工程目标的广泛性,评估潜在的AGI系统和方法具有挑战性。我们缺乏对最终状态进行完美评估的方法,只能通过设计小型测试来测量性能,这些测试旨在提供我们正在接近AGI的方向性指示。本文认为,现有的AGI评估方法主要受一种设计理念主导,即利用我们对智能的直觉来创建合成任务,而这种理念在人工智能历史上表现不佳。相反,我们主张一种替代的设计理念,专注于评估稳健的任务执行能力,旨在通过胜任力来证明AGI。这一视角源于数据科学中的常见实践,这些实践用于证明系统可以可靠部署。我们提供了这种理念在AGI评估中的具体实践示例。