Concept Bottleneck Models (CBMs) improve the explainability of black-box Deep Learning (DL) by introducing intermediate semantic concepts. However, standard CBMs often overlook domain-specific relationships and causal mechanisms, and their dependence on complete concept labels limits applicability in scientific domains where supervision is sparse but processes are well defined. To address this, we propose the Process-Guided Concept Bottleneck Model (PG-CBM), an extension of CBMs which constrains learning to follow domain-defined causal mechanisms through biophysically meaningful intermediate concepts. Using above ground biomass density estimation from Earth Observation data as a case study, we show that PG-CBM reduces error and bias compared to multiple benchmarks, whilst leveraging multi-source heterogeneous training data and producing interpretable intermediate outputs. Beyond improved accuracy, PG-CBM enhances transparency, enables detection of spurious learning, and provides scientific insights, representing a step toward more trustworthy AI systems in scientific applications.
翻译:概念瓶颈模型(CBMs)通过引入中间语义概念来提升黑箱深度学习(DL)的可解释性。然而,标准CBMs常常忽视特定领域的关系与因果机制,且其对完整概念标签的依赖限制了其在监督稀疏但过程定义明确的科学领域的适用性。为解决这一问题,我们提出了过程引导概念瓶颈模型(PG-CBM),这是CBMs的一种扩展,它通过具有生物物理意义的中间概念,将学习过程约束为遵循领域定义的因果机制。以基于地球观测数据的地上生物量密度估算为案例研究,我们证明PG-CBM在利用多源异构训练数据并生成可解释中间输出的同时,与多个基准模型相比,降低了误差与偏差。除了提升准确性外,PG-CBM增强了透明度,能够检测虚假学习,并提供科学洞见,代表了科学应用中迈向更可信赖AI系统的一步。