This paper reports a case study on how explainability requirements were elicited during the development of an AI system for predicting cerebral palsy (CP) risk in infants. Over 18 months, we followed a development team and hospital clinicians as they sought to design explanations that would make the AI system trustworthy. Contrary to the assumption that users need detailed explanations of the inner workings of AI systems, our findings show that clinicians trusted it when it enabled them to evaluate predictions against their own assessments. Our findings show how a simple prediction graph proved effective by supporting clinicians' existing decision-making practices. Drawing on concepts from both Requirements Engineering and Explainable AI, we use the theoretical lens of Evaluative AI to introduce the notion of Evaluative Requirements: system requirements that allow users to scrutinize AI outputs and compare them with their own assessments. Our study demonstrates that such requirements are best discovered through the well-known methods of iterative prototyping and observation, making them essential for building trustworthy AI systems in expert domains.
翻译:本文报告了一项案例研究,探讨在开发用于预测婴儿脑瘫风险的AI系统过程中,如何引出可解释性需求。在18个月的时间里,我们跟踪了一个开发团队和医院临床医生,观察他们如何设计能使AI系统值得信赖的解释。与用户需要详细解释AI系统内部运作的假设相反,我们的研究结果表明,当AI系统能使临床医生根据他们自己的评估来评价预测结果时,他们就会信任它。我们的发现表明,一个简单的预测图通过支持临床医生现有的决策实践而被证明是有效的。借鉴需求工程和可解释AI的概念,我们使用评估性AI的理论视角,引入了评估性需求的概念:即允许用户审查AI输出并将其与自身评估进行比较的系统需求。我们的研究表明,这类需求最好通过众所周知的迭代原型设计和观察方法来发现,这使得它们对于在专业领域构建可信赖的AI系统至关重要。