While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, expressive concept encodings effectively allow to translate such data into interpretable concepts but are often utilised in the context of low-capacity linear predictors. In this work, we push towards real-world verifiability by combining the strengths of both approaches. We introduce Neural Concept Verifier (NCV), a unified framework combining PVGs for formal verifiability with concept encodings to handle complex, high-dimensional inputs in an interpretable way. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier, implemented as a nonlinear predictor, uses exclusively for decision-making. Our evaluations show that NCV outperforms classic concept-based models and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward concept-level, verifiable AI.
翻译:尽管证明者-验证者博弈为非线性分类模型的可验证性提供了有前景的路径,但其尚未应用于高维图像等复杂输入场景。另一方面,富有表现力的概念编码能有效将此类数据转化为可解释的概念,但通常仅在低容量线性预测器的背景下使用。本研究通过融合两种方法的优势,向现实世界的可验证性推进。我们提出神经概念验证器——一个将形式化可验证的PVG与处理复杂高维输入的概念编码相统一的框架,并以可解释的方式实现。NCV通过利用近期提出的最小监督概念发现模型,从原始输入中提取结构化概念编码。证明者随后从这些编码中选择子集,由作为非线性预测器实现的验证者专门用于决策。评估结果表明,在高维逻辑复杂数据集上,NCV超越了传统基于概念的模型和基于像素的PVG分类器基线,并有助于缓解捷径行为。总体而言,我们证明NCV是迈向概念级可验证人工智能的重要进展。