Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data". This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further propose an Index-Preserving Adaptation strategy that fine-tunes only the query encoder, achieving strong performance gains while keeping document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that Parameter-Efficient Fine-Tuning (PEFT) of the query encoder delivers a remarkable quality-efficiency trade-off, enabling scalable and practical enterprise search adaptation.
翻译:大规模多租户检索系统虽生成海量查询日志,却缺乏用于有效领域适配的标注相关性标签,导致大量"暗数据"未被充分利用。该挑战因模型更新成本高昂而加剧——联合微查询与文档编码器需对全语料库重新索引,这在拥有数千个独立索引的多租户场景中并不现实。本文提出DevRev-Search:一个通过全自动化流程构建的技术客户支持段落检索基准。候选生成阶段融合了多种稀疏与稠密检索器,随后采用LLM-as-a-Judge进行一致性过滤与相关性标注。我们进一步提出索引保持适配策略,该策略仅微调查询编码器,在保持文档索引固定的同时实现显著的性能提升。在DevRev-Search、SciFact和FiQA-2018数据集上的实验表明,查询编码器的参数高效微调技术实现了卓越的质量-效率权衡,为可扩展的实用企业搜索适配提供了解决方案。