Improving the Efficiency and Effectiveness of LLM Knowledge Distillation for Conversational Search

Conversational Search (CS) considers retrieval of relevant documents based on conversational context. Large Language Models (LLMs) have significantly enhanced CS by enabling effective query rewriting. However, employing LLMs during inference poses efficiency challenges. A method to balance effectiveness and efficiency is the use of knowledge distillation from LLM-based query rewriting. Recent work applies the Kullback-Leibler Divergence (KLD) for distillation, relaxing the alignment with the teacher signal compared to previous methods. Despite these gains, several aspects of KLD-based distillation for conversational search remain understudied, and we investigate them in this work. Prior work in related fields suggests that adding a contrastive loss to the KLD objective can improve performance; we confirm this and observe significant gains in precision-oriented ranking metrics. We also find that contrastive sampling strategies for the KLD loss have a non-trivial impact and must be chosen carefully. Although theory suggests that more samples improve the KLD estimate, experiments show diminishing returns on the number of used samples. Finally, we address the phenomenon of decreased sparsity in longer conversations, which limits computational efficiency across sparse retrieval methods. We find that the representations from the model distilled with the KLD loss can be strongly regularized with a regularization loss, substantially improving sparsity and inference efficiency without significantly harming retrieval effectiveness. We achieve a $2\times$ decrease in FLOPS on TopiOCQA with negligible loss in effectiveness, corresponding to a $\leq 2%$ drop in Recall@100. Our results provide insights into distillation objectives for learned sparse conversational retrievers and offer practical guidelines for improving effectiveness and efficiency in first-stage retrieval.

翻译：对话式搜索（CS）基于对话上下文检索相关文档。大语言模型（LLMs）通过实现有效的查询重写显著提升了对话式搜索的性能。然而，在推理阶段使用大语言模型会带来效率挑战。一种平衡有效性与效率的方法是采用基于大语言模型查询重写的知识蒸馏。最近的研究应用库尔贝克-莱布勒散度（KLD）进行蒸馏，与先前方法相比，放松了与教师信号的对齐。尽管取得了这些进展，基于KLD的对话式搜索蒸馏仍有多个方面尚未充分研究，本文对其进行了探讨。相关领域的先前研究表明，在KLD目标中添加对比损失可以提升性能；我们证实了这一结论，并观察到在面向精度的排序指标上取得了显著提升。我们还发现，用于KLD损失的对比采样策略具有非平凡影响，必须谨慎选择。尽管理论上更多样本能改善KLD估计，但实验显示增加样本数量的收益递减。最后，我们解决了长对话中稀疏性下降的问题——这会限制稀疏检索方法的计算效率。我们观察到，经过KLD损失蒸馏的模型表示可通过正则化损失进行强正则化，从而在几乎不损害检索有效性的情况下显著提升稀疏性和推理效率。在TopiOCQA数据集上，我们实现了2倍计算开销（FLOPS）的下降，而有效性损失极小（Recall@100下降不超过2%）。我们的研究结果为学习型稀疏对话检索器的蒸馏目标提供了见解，并为改进首阶段检索的有效性和效率提供了实用指导。