Concept-based Models are neural networks that learn a concept extractor to map inputs to high-level concepts and an inference layer to translate these into predictions. Ensuring these modules produce interpretable concepts and behave reliably in out-of-distribution is crucial, yet the conditions for achieving this remain unclear. We study this problem by establishing a novel connection between Concept-based Models and reasoning shortcuts (RSs), a common issue where models achieve high accuracy by learning low-quality concepts, even when the inference layer is fixed and provided upfront. Specifically, we extend RSs to the more complex setting of Concept-based Models and derive theoretical conditions for identifying both the concepts and the inference layer. Our empirical results highlight the impact of RSs and show that existing methods, even combined with multiple natural mitigation strategies, often fail to meet these conditions in practice.
翻译:概念模型是一种神经网络,它通过学习概念提取器将输入映射到高层概念,并通过推理层将这些概念转化为预测。确保这些模块产生可解释的概念并在分布外场景中可靠运行至关重要,然而实现这一目标的条件尚不明确。我们通过建立概念模型与推理捷径之间的新联系来研究此问题。推理捷径是一种常见问题,即模型通过学习低质量概念获得高准确率,即使推理层已预先固定并提供。具体而言,我们将推理捷径扩展到概念模型这一更复杂的场景中,并推导出识别概念与推理层的理论条件。我们的实证结果突显了推理捷径的影响,并表明现有方法即使结合多种自然缓解策略,在实践中也常常无法满足这些条件。