Benchmarks for Physical Reasoning AI

Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.

翻译：物理推理是通用人工智能系统发展的关键方面，因为人类学习始于与物理世界的互动，而后才进入更复杂的概念。尽管研究者通过各种特定基准对人工智能方法的物理推理能力进行了研究和评估，但缺乏评估和衡量进展的综合性方法。因此，我们旨在概述现有基准及其解决方案，并提出一种统一视角来衡量人工智能系统的物理推理能力。我们选取了设计用于测试算法在物理推理任务中性能的基准。虽然每个选定的基准都提出了独特挑战，但其整体为通用人工智能代理提供了一个全面的验证平台，可针对各种物理推理概念测量其技能水平。这使得此类基准集合优于其他旨在通过交织复杂性和多个概念来模拟现实世界的整体性基准。我们将呈现的物理推理基准集划分为子类别，以便更窄领域的通用人工智能代理可首先在这些组上进行测试。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日