When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e.g., book characters or events), information beyond the document text (e.g., descriptions of book covers), or personal context (e.g., when they read a book). This retrieval setting, called tip of the tongue (TOT), is especially challenging for models heavily reliant on lexical and semantic overlap between query and document text. In this work, we introduce a simple yet effective framework for handling such complex queries by decomposing the query into individual clues, routing those as sub-queries to specialized retrievers, and ensembling the results. This approach allows us to take advantage of off-the-shelf retrievers (e.g., CLIP for retrieving images of book covers) or incorporate retriever-specific logic (e.g., date constraints). We show that our framework incorportating query decompositions into retrievers can improve gold book recall up to 7% relative again for Recall@5 on a new collection of 14,441 real-world query-book pairs from an online community for resolving TOT inquiries.
翻译:在重新寻找物品时,用户若遗忘或不确定具体细节,常依赖创造性策略来表达信息需求——这些复杂查询描述了内容元素(如图书角色或事件)、超出文档文本的信息(如图书封面的描述)或个人背景(如阅读时间)。这种检索场景被称为“舌尖效应”,对高度依赖查询与文档文本之间词汇及语义重叠的模型尤为困难。本研究提出了一种简单而有效的框架,通过将查询分解为单个线索,将这些线索作为子查询路由至专用检索器,并集成结果。该方法使我们能够利用现成的检索器(如用于检索图书封面图像的CLIP)或整合检索器特定的逻辑(如日期约束)。实验表明,在来自在线社区解决“舌尖效应”查询的14,441个真实查询-图书对的新数据集上,我们的框架通过将查询分解集成到检索器中,在Recall@5指标上可将正确图书的召回率相对提升高达7%。