Concept bottleneck models (CBMs) are a class of interpretable neural network models that predict the target response of a given input based on its high-level concepts. Unlike the standard end-to-end models, CBMs enable domain experts to intervene on the predicted concepts and rectify any mistakes at test time, so that more accurate task predictions can be made at the end. While such intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain rather unexplored. In this work, we develop various ways of selecting intervening concepts to improve the intervention effectiveness and conduct an array of in-depth analyses as to how they evolve under different circumstances. Specifically, we find that an informed intervention strategy can reduce the task error more than ten times compared to the current baseline under the same amount of intervention counts in realistic settings, and yet, this can vary quite significantly when taking into account different intervention granularity. We verify our findings through comprehensive evaluations, not only on the standard real datasets, but also on synthetic datasets that we generate based on a set of different causal graphs. We further discover some major pitfalls of the current practices which, without a proper addressing, raise concerns on reliability and fairness of the intervention procedure.
翻译:概念瓶颈模型(CBM)是一类可解释的神经网络模型,它基于输入数据的高层概念预测其目标响应。与标准的端到端模型不同,CBM允许领域专家在测试阶段对预测的概念进行干预并纠正错误,从而最终获得更准确的任务预测。尽管这种可干预性提供了强大的控制途径,但干预过程的许多方面仍未被充分探索。本研究开发了多种选择干预概念的方法以提升干预效果,并针对不同情境下这些方法的演化规律开展了系列深入分析。具体而言,我们发现:在现实场景中,当干预次数相同时,具有信息量的干预策略可将任务错误率降低至当前基线的十分之一以下;然而,当考虑不同干预粒度时,这一效果会产生显著差异。我们不仅通过标准真实数据集,还基于多组不同因果图生成的合成数据集进行了全面评估,验证了上述发现。此外,我们还揭示了当前实践中的若干重大隐患——若不妥善处理,这些隐患将对干预过程的可靠性和公平性构成威胁。