The aim of this article is to investigate the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking. We demonstrate this for both English and Polish languages, using data from one of the largest Polish e-commerce sites and selected open-domain datasets. We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data. Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models. Finally, we investigate uniformity and alignment of the embeddings to explain the effect of NLI-based fine-tuning for an out-of-domain use-case.
翻译:本文旨在探究利用自然语言推理(NLI)数据进行微调以提升信息检索与排序性能的潜力。我们分别在英语和波兰语两种语言上进行了验证,采用了波兰最大电子商务平台之一的数据及若干开放域数据集。研究中使用了通过监督方法(结合对比损失与NLI数据)微调的单语及多语句子编码器。实验结果表明,NLI微调能够提升模型在两项任务及两种语言上的表现,并具备改进单语及多语模型的潜力。最后,我们通过分析嵌入向量的均匀性与对齐性,进一步解释了基于NLI微调在域外应用场景中的效果机理。