Neural ranking models have become increasingly popular for real-world search and recommendation systems in recent years. Unlike their tree-based counterparts, neural models are much less interpretable. That is, it is very difficult to understand their inner workings and answer questions like how do they make their ranking decisions? or what document features do they find important? This is particularly disadvantageous since interpretability is highly important for real-world systems. In this work, we explore feature selection for neural learning-to-rank (LTR). In particular, we investigate six widely-used methods from the field of interpretable machine learning (ML) and introduce our own modification, to select the input features that are most important to the ranking behavior. To understand whether these methods are useful for practitioners, we further study whether they contribute to efficiency enhancement. Our experimental results reveal a large feature redundancy in several LTR benchmarks: the local selection method TabNet can achieve optimal ranking performance with less than 10 features; the global methods, particularly our G-L2X, require slightly more selected features, but exhibit higher potential in improving efficiency. We hope that our analysis of these feature selection methods will bring the fields of interpretable ML and LTR closer together.
翻译:近年来,神经排序模型在真实世界的搜索和推荐系统中越来越受欢迎。与基于树的模型不同,神经模型的可解释性要差得多。也就是说,很难理解其内部工作机制,并回答诸如“它们如何做出排序决策?”或“它们认为哪些文档特征重要?”等问题。这一点尤为不利,因为可解释性对于真实世界的系统至关重要。在本工作中,我们探讨了神经学习排序(LTR)中的特征选择问题。具体而言,我们研究了可解释机器学习(ML)领域六种广泛使用的方法,并引入我们自己的改进方法,以选择对排序行为最重要的输入特征。为了理解这些方法是否对实践者有用,我们进一步研究了它们是否有助于提升效率。我们的实验结果显示,多个LTR基准中存在大量特征冗余:局部选择方法TabNet可以用少于10个特征实现最优排序性能;全局方法(尤其是我们提出的G-L2X)需要选择稍多的特征,但在提升效率方面展现出更高的潜力。我们希望通过对这些特征选择方法的分析,能够拉近可解释ML与LTR领域之间的距离。