The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available at: https://unitn-sml.github.io/rsbench.
翻译:随着强大神经分类器的出现,人们对同时需要学习和推理的问题产生了日益浓厚的兴趣。这些问题对于理解模型的关键特性至关重要,例如可信性、泛化性、可解释性以及对于安全性和结构性约束的合规性。然而,近期研究发现,需要在背景知识上进行学习与推理的任务常受到推理捷径(RSs)的影响:预测器可以在未将正确概念与高维数据关联的情况下解决下游推理任务。为应对这一问题,我们提出了rsbench——一个综合性基准测试套件,旨在通过提供易于访问且高度可定制的受RSs影响任务,系统评估RSs对模型的影响。此外,rsbench实现了评估概念质量的常用指标,并引入了新颖的形式化验证流程以检测学习任务中RSs的存在。借助rsbench,我们揭示了在纯神经模型与神经符号模型中获取高质量概念仍是一个远未解决的问题。rsbench可通过以下网址访问:https://unitn-sml.github.io/rsbench。