Neuro-symbolic predictors learn a mapping from sub-symbolic inputs to higher-level concepts and then carry out (probabilistic) logical inference on this intermediate representation. This setup offers clear advantages in terms of consistency to symbolic prior knowledge, and is often believed to provide interpretability benefits in that - by virtue of complying with the knowledge - the learned concepts can be better understood by human stakeholders. However, it was recently shown that this setup is affected by reasoning shortcuts whereby predictions attain high accuracy by leveraging concepts with unintended semantics, yielding poor out-of-distribution performance and compromising interpretability. In this short paper, we establish a formal link between reasoning shortcuts and the optima of the loss function, and identify situations in which reasoning shortcuts can arise. Based on this, we discuss limitations of natural mitigation strategies such as reconstruction and concept supervision.
翻译:神经符号预测器学习从亚符号输入到高层概念的映射,然后在此中间表征上执行(概率)逻辑推理。这一设置在与符号先验知识的一致性方面具有明显优势,且通常被认为能提供可解释性益处——由于符合知识约束,学习到的概念能更好地被人类利益相关者理解。然而,近期研究表明该设置会受到推理捷径的影响,即预测通过利用具有非预期语义的概念实现高精度,导致分布外性能下降并损害可解释性。在本短文中,我们建立了推理捷径与损失函数最优解之间的形式化关联,识别了推理捷径可能出现的场景。在此基础上,我们讨论了重构和概念监督等自然缓解策略的局限性。