Benchmarks for Automated Commonsense Reasoning: A Survey

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities. This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.

翻译：超过一百个基准测试已被开发用于评估人工智能系统的常识知识和常识推理能力。然而，这些基准往往存在缺陷，且常识的许多方面仍未得到检验。因此，目前我们尚无法可靠地衡量现有AI系统在何种程度上实现了这些能力。本文综述了AI常识基准的开发与应用情况。我们探讨了常识的本质、常识在AI中的作用、构建常识基准的目的，以及常识基准应具备的理想特性。我们分析了基准中常见的缺陷，并论证了投入必要工作以确保基准示例始终保持高质量是值得的。我们调研了构建常识基准的各种方法，并列举了已开发的139个常识基准：其中102个基于文本、18个基于图像、12个基于视频、7个基于模拟物理环境。我们讨论了现有基准中的空白，以及任何现有基准均未涉及的常识推理方面的内容。最后，我们为常识AI基准的未来发展提出了一系列建议。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

39页PPT！马普所Gerhard Weikum介绍知识图谱历史、教训、挑战、机会【Knowledge Graphs 2021: A Data Odyssey】

专知会员服务

21+阅读 · 2022年2月25日

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日