Retrieving known items from vague descriptions, Tip-of-the-Tongue (ToT) retrieval, remains a significant challenge. We propose using a single call to a generic 8B-parameter LLM for query reformulation, bridging the gap between ill-formed ToT queries and specific information needs. This method is particularly effective where standard Pseudo-Relevance Feedback fails due to poor initial recall. Crucially, our LLM is not fine-tuned for ToT or specific domains, demonstrating that gains stem from our prompting strategy rather than model specialization. Rewritten queries feed a multi-stage pipeline: sparse retrieval (BM25), dense/late-interaction reranking (Contriever, E5-large-v2, ColBERTv2), monoT5 cross-encoding, and list-wise reranking (Qwen 2.5 72B). Experiments on 2025 TREC-ToT datasets show that while raw queries yield poor performance, our lightweight pre-retrieval transformation improves Recall by 20.61%. Subsequent reranking improves nDCG@10 by 33.88%, MRR by 29.92%, and MAP@10 by 29.98%, offering a cost-effective intervention that unlocks the potential of downstream rankers. Code and data: https://github.com/debayan1405/TREC-TOT-2025
翻译:从模糊描述中检索已知项,即舌尖现象(ToT)检索,仍然是一个重大挑战。我们提出使用单次调用一个通用的80亿参数大语言模型(LLM)进行查询重写,以弥合表述不佳的ToT查询与具体信息需求之间的差距。该方法在标准伪相关反馈因初始召回率低而失效的情况下尤其有效。关键在于,我们的大语言模型并未针对ToT或特定领域进行微调,这表明性能提升源于我们的提示策略而非模型专业化。重写后的查询输入一个多阶段流水线:稀疏检索(BM25)、稠密/延迟交互重排序(Contriever、E5-large-v2、ColBERTv2)、monoT5交叉编码以及列表式重排序(Qwen 2.5 72B)。在2025年TREC-ToT数据集上的实验表明,原始查询性能较差,而我们轻量级的检索前转换将召回率提升了20.61%。随后的重排序将nDCG@10提升了33.88%,MRR提升了29.92%,MAP@10提升了29.98%,提供了一种经济高效的干预措施,释放了下游排序器的潜力。代码与数据:https://github.com/debayan1405/TREC-TOT-2025