Many ethical frameworks require artificial intelligence (AI) systems to be explainable. Explainable AI (XAI) models are frequently tested for their adequacy in user studies. Since different people may have different explanatory needs, it is important that participant samples in user studies are large enough to represent the target population to enable generalizations. However, it is unclear to what extent XAI researchers reflect on and justify their sample sizes or avoid broad generalizations across people. We analyzed XAI user studies (n = 220) published between 2012 and 2022. Most studies did not offer rationales for their sample sizes. Moreover, most papers generalized their conclusions beyond their target population, and there was no evidence that broader conclusions in quantitative studies were correlated with larger samples. These methodological problems can impede evaluations of whether XAI systems implement the explainability called for in ethical frameworks. We outline principles for more inclusive XAI user studies.
翻译:许多伦理框架要求人工智能(AI)系统具有可解释性。可解释人工智能(XAI)模型通常通过用户研究来检验其充分性。由于不同人群可能具有不同的解释需求,用户研究中的参与者样本必须足够大以代表目标人群,从而能够进行泛化。然而,目前尚不清楚XAI研究者在多大程度上反思和论证其样本量,或避免对人群进行过度泛化。我们分析了2012年至2022年间发表的220项XAI用户研究。大多数研究未提供样本量的理由。此外,多数论文将其结论泛化至目标人群之外,且没有证据表明定量研究中更广泛的结论与更大的样本量相关。这些方法论问题会阻碍对XAI系统是否实现伦理框架所要求的可解释性进行评估。我们提出了更具包容性的XAI用户研究原则。