Ante-hoc interpretability has become the holy grail of explainable artificial intelligence for high-stakes domains such as healthcare; however, this notion is elusive, lacks a widely-accepted definition and depends on the operational context. It can refer to predictive models whose structure adheres to domain-specific constraints, or ones that are inherently transparent. The latter conceptualisation assumes observers who judge this quality, whereas the former presupposes them to have technical and domain expertise (thus alienating other groups of explainees). Additionally, the distinction between ante-hoc interpretability and the less desirable post-hoc explainability, which refers to methods that construct a separate explanatory model, is vague given that transparent predictive models may still require (post-)processing to yield suitable explanatory insights. Ante-hoc interpretability is thus an overloaded concept that comprises a range of implicit properties, which we unpack in this paper to better understand what is needed for its safe adoption across high-stakes domains. To this end, we outline modelling and explaining desiderata that allow us to navigate its distinct realisations in view of the envisaged application and audience.
翻译:先验可解释性已成为医疗等高风险领域可解释人工智能的圣杯;然而,这一概念难以捉摸,缺乏广泛接受的定义,且依赖于操作上下文。它既可指结构遵循领域特定约束的预测模型,也可指本质上透明的模型。后者隐含了评判这一特性的观察者,而前者则预设观察者具备技术和领域专业知识(从而疏远了其他解释受众)。此外,先验可解释性与较不受欢迎的后验可解释性(指构建独立解释模型的方法)之间的界限是模糊的,因为透明预测模型可能仍需经过(后)处理才能生成合适的解释性见解。因此,先验可解释性是一个过载概念,包含一系列隐含属性,本文对其进行解析,以更好地理解其在各高风险领域中安全采用所需的条件。为此,我们概述了建模与解释的期望准则,使能够根据预期的应用场景和受众来导航其不同的实现形式。