Explainable Information Retrieval (XIR) is a growing research area focused on enhancing transparency and trustworthiness of the complex decision-making processes taking place in modern information retrieval systems. While there has been progress in developing XIR systems, empirical evaluation tools to assess the degree of explainability attained by such systems are lacking. To close this gap and gain insights into the true merit of XIR systems, we extend existing insights from a factor analysis of search explainability to introduce SSE (Search System Explainability), an evaluation metric for XIR search systems. Through a crowdsourced user study, we demonstrate SSE's ability to distinguish between explainable and non-explainable systems, showing that systems with higher scores indeed indicate greater interpretability. Additionally, we observe comparable perceived temporal demand and performance levels between non-native and native English speakers. We hope that aside from these concrete contributions to XIR, this line of work will serve as a blueprint for similar explainability evaluation efforts in other domains of machine learning and natural language processing.
翻译:可解释信息检索(XIR)是一个新兴的研究领域,旨在增强现代信息检索系统中复杂决策过程的透明度和可信度。尽管在开发XIR系统方面已取得进展,但缺乏用于评估此类系统可解释性程度的实证评估工具。为弥补这一空白并深入理解XIR系统的真正价值,我们基于对搜索可解释性因子分析的现有见解,提出了SSE(搜索系统可解释性),一种用于XIR搜索系统的评估度量。通过众包用户研究,我们证明了SSE能够区分可解释系统与不可解释系统,表明得分更高的系统确实具有更强的可解释性。此外,我们观察到非英语母语者与英语母语者在感知时间需求与性能水平上具有可比性。我们希望,除了对XIR领域的具体贡献外,这一系列工作还将为机器学习和自然语言处理其他领域的类似可解释性评估工作提供蓝图。