Large language models have recently enabled a generative paradigm for query expansion, but their high inference cost makes direct deployment difficult in practical retrieval systems. To address this issue, a retrieval-feedback-driven distillation and preference-alignment framework is proposed to transfer retrieval-friendly expansion behavior from a strong teacher model to a compact student model. Rather than relying on few-shot exemplars at inference time, the framework first leverages two complementary types of teacher-generated expansions, produced under zero-shot and few-shot prompting conditions, as supervision signals for distillation and as candidate pools for preference construction. A retrieval-metric-driven strategy is then introduced to automatically form chosen/rejected expansion pairs according to nDCG@10 differences, and Direct Preference Optimization is applied to explicitly align generation preferences with retrieval objectives. Experiments on TREC DL19/20/21 and MIRACL-zh show that the proposed approach preserves strong retrieval effectiveness while substantially reducing inference cost. In particular, the distilled Qwen3-4B model reaches about 97% of the teacher (DeepSeek-685B) model's nDCG@10 performance on DL19, and remains effective on the Chinese MIRACL-zh benchmark, demonstrating strong practicality across both English and Chinese retrieval settings.
翻译:大语言模型近期为查询扩展提供了一种生成式范式,但其高昂的推理成本使其难以直接部署于实际检索系统中。为解决此问题,本文提出一种基于检索反馈驱动的蒸馏与偏好对齐框架,旨在将检索友好的扩展行为从一个强大的教师模型迁移至一个紧凑的学生模型。该框架不依赖于推理时的少样本示例,而是首先利用在零样本和少样本提示条件下生成的两种互补的教师模型扩展结果,作为蒸馏的监督信号和偏好构建的候选池。随后,引入一种检索指标驱动的策略,根据nDCG@10的差异自动构建被选中/被拒绝的扩展对,并应用直接偏好优化方法,使生成偏好与检索目标显式对齐。在TREC DL19/20/21和MIRACL-zh数据集上的实验表明,所提方法在显著降低推理成本的同时,保持了强大的检索效能。具体而言,蒸馏后的Qwen3-4B模型在DL19数据集上达到了教师模型(DeepSeek-685B)约97%的nDCG@10性能,并在中文MIRACL-zh基准测试中保持有效,证明了其在英文和中文检索场景下均具备较强的实用性。