Recent research has explored distilling knowledge from large language models (LLMs) to optimize retriever models, especially within the retrieval-augmented generation (RAG) framework. However, most existing training methods rely on extracting supervision signals from LLMs' weights or their output probabilities, which is not only resource-intensive but also incompatible with black-box LLMs. In this paper, we introduce \textit{Intermediate Distillation}, a data-efficient knowledge distillation training scheme that treats LLMs as black boxes and distills their knowledge via an innovative LLM-ranker-retriever pipeline, solely using LLMs' ranking generation as the supervision signal. Extensive experiments demonstrate that our proposed method can significantly improve the performance of retriever models with only 1,000 training instances. Moreover, our distilled retriever model significantly boosts performance in question-answering tasks within the RAG framework, demonstrating the potential of LLMs to economically and effectively train smaller models.
翻译:近期研究探索了从大语言模型(LLM)中蒸馏知识以优化检索器模型的方法,特别是在检索增强生成(RAG)框架内。然而,现有训练方法大多依赖于从LLM的权重或其输出概率中提取监督信号,这不仅需要大量计算资源,而且无法适用于黑盒LLM。本文提出\textit{中间蒸馏},一种高效的数据知识蒸馏训练方案,该方法将LLM视为黑盒,并通过创新的LLM-排序器-检索器流程,仅利用LLM生成的排序结果作为监督信号来蒸馏其知识。大量实验表明,我们提出的方法仅需1,000个训练实例即可显著提升检索器模型的性能。此外,我们通过蒸馏得到的检索器模型在RAG框架下的问答任务中显著提升了性能,这证明了LLM能够以经济高效的方式训练更小模型的潜力。