Explainability has become a crucial concern in today's world, aiming to enhance transparency in machine learning and deep learning models. Information retrieval is no exception to this trend. In existing literature on explainability of information retrieval, the emphasis has predominantly been on illustrating the concept of relevance concerning a retrieval model. The questions addressed include why a document is relevant to a query, why one document exhibits higher relevance than another, or why a specific set of documents is deemed relevant for a query. However, limited attention has been given to understanding why a particular document is not favored (e.g., not within top-K) with respect to a query and a retrieval model. In an effort to address this gap, our work focuses on the question of what terms need to be added within a document to improve its ranking. This, in turn, answers the question of which words in the document played a role in not being favored by a retrieval model for a particular query. We use a counterfactual framework to solve the above-mentioned research problem. % To the best of our knowledge, we mark the first attempt to tackle this specific counterfactual problem (i.e. examining the absence of which words can affect the ranking of a document). Our experiments show the effectiveness of our proposed approach in predicting counterfactuals for both statistical (e.g. BM25) and deep-learning-based models (e.g. DRMM, DSSM, ColBERT, MonoT5).
翻译:可解释性已成为当今世界的重要关注点,旨在提升机器学习和深度学习模型的透明度。信息检索领域也不例外。在现有关于信息检索可解释性的文献中,重点主要集中于阐明检索模型的相关性概念。所探讨的问题包括:为何某篇文档与查询相关、为何某篇文档比另一篇更相关,或为何某组文档被认为与查询相关。然而,针对为何特定文档在查询和检索模型下未被偏好(例如未进入前K名)的理解却关注有限。为弥补这一空白,我们的研究聚焦于以下问题:需在文档中添加哪些词项以提升其排序。这进一步回答了文档中哪些词导致其未被特定查询的检索模型偏好的问题。我们采用反事实框架来求解上述研究问题。据我们所知,这是首次尝试解决这一特定反事实问题(即探究哪些词的缺失会影响文档排序)。实验表明,我们提出的方法在预测统计模型(如BM25)和基于深度学习的模型(如DRMM、DSSM、ColBERT、MonoT5)的反事实方面具有有效性。