Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data". This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further propose an Index-Preserving Adaptation strategy that fine-tunes only the query encoder, achieving strong performance gains while keeping document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that Parameter-Efficient Fine-Tuning (PEFT) of the query encoder delivers a remarkable quality-efficiency trade-off, enabling scalable and practical enterprise search adaptation.
翻译:大规模多租户检索系统虽生成海量查询日志,却缺乏用于有效领域自适应的精标注相关性标签,导致大量"暗数据"未被充分利用。该挑战因模型更新成本高昂而加剧——联合微查询与文档编码器需对全语料库进行重索引,这在拥有数千个独立索引的多租户场景中并不现实。本文提出DevRev-Search:一个通过全自动化流程构建的、面向技术客户支持的段落检索基准数据集。候选生成阶段融合了多样化稀疏与稠密检索器,随后采用LLM-as-a-Judge进行一致性过滤与相关性标注。我们进一步提出索引保持自适应策略,该策略仅微调查询编码器,在保持文档索引固定的同时实现显著的性能提升。在DevRev-Search、SciFact和FiQA-2018数据集上的实验表明,对查询编码器进行参数高效微调(PEFT)能达成卓越的质量-效率平衡,为可扩展的实用企业搜索自适应提供了可行方案。