A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available at: https://unitn-sml.github.io/rsbench.

翻译：随着强大神经分类器的出现，人们对同时需要学习和推理的问题产生了日益浓厚的兴趣。这些问题对于理解模型的关键特性至关重要，例如可信性、泛化性、可解释性以及对于安全性和结构性约束的合规性。然而，近期研究发现，需要在背景知识上进行学习与推理的任务常受到推理捷径（RSs）的影响：预测器可以在未将正确概念与高维数据关联的情况下解决下游推理任务。为应对这一问题，我们提出了rsbench——一个综合性基准测试套件，旨在通过提供易于访问且高度可定制的受RSs影响任务，系统评估RSs对模型的影响。此外，rsbench实现了评估概念质量的常用指标，并引入了新颖的形式化验证流程以检测学习任务中RSs的存在。借助rsbench，我们揭示了在纯神经模型与神经符号模型中获取高质量概念仍是一个远未解决的问题。rsbench可通过以下网址访问：https://unitn-sml.github.io/rsbench。

相关内容

RSS

关注 2

RSS（简易信息聚合，也叫聚合内容）是一种描述和同步网站内容的格式。RSS可以是以下三个解释的其中一个： Really Simple Syndication；RDF (Resource Description Framework) Site Summary； Rich Site Summary。但其实这三个解释都是指同一种Syndication的技术。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日