Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scenarios. Research on TOT retrieval is further constrained by the challenge of collecting queries, as current approaches rely heavily on community question-answering (CQA) websites, leading to labor-intensive evaluation and domain bias. To overcome these limitations, we introduce two methods for eliciting TOT queries - leveraging large language models (LLMs) and human participants - to facilitate simulated evaluations of TOT retrieval systems. Our LLM-based TOT user simulator generates synthetic TOT queries at scale, achieving high correlations with how CQA-based TOT queries rank TOT retrieval systems when tested in the Movie domain. Additionally, these synthetic queries exhibit high linguistic similarity to CQA-derived queries. For human-elicited queries, we developed an interface that uses visual stimuli to place participants in a TOT state, enabling the collection of natural queries. In the Movie domain, system rank correlation and linguistic similarity analyses confirm that human-elicited queries are both effective and closely resemble CQA-based queries. These approaches reduce reliance on CQA-based data collection while expanding coverage to underrepresented domains, such as Landmark and Person. LLM-elicited queries for the Movie, Landmark, and Person domains have been released as test queries in the TREC 2024 TOT track, with human-elicited queries scheduled for inclusion in the TREC 2025 TOT track. Additionally, we provide source code for synthetic query generation and the human query collection interface, along with curated visual stimuli used for eliciting TOT queries.
翻译:舌尖(Tip-of-the-tongue,TOT)搜索发生在用户难以回忆起特定标识符(如文档标题)时。尽管常见,现有搜索系统往往无法有效支持TOT场景。TOT检索研究进一步受到查询收集挑战的限制,因为当前方法严重依赖社区问答(Community Question-Answering,CQA)网站,导致评估过程劳动密集且存在领域偏差。为克服这些限制,我们引入了两种诱发TOT查询的方法——利用大型语言模型(LLMs)和人类参与者——以促进TOT检索系统的模拟评估。我们基于LLM的TOT用户模拟器能够大规模生成合成TOT查询,在电影领域测试中,其与基于CQA的TOT查询对TOT检索系统的排序结果呈现出高度相关性。此外,这些合成查询在语言上与CQA衍生查询具有高度相似性。对于人类诱发的查询,我们开发了一个利用视觉刺激使参与者进入TOT状态的界面,从而能够收集自然查询。在电影领域中,系统排序相关性和语言相似性分析证实,人类诱发的查询既有效又高度近似于基于CQA的查询。这些方法减少了对基于CQA数据收集的依赖,同时将覆盖范围扩展至代表性不足的领域,如地标和人物。针对电影、地标和人物领域的LLM诱发查询已作为测试查询在TREC 2024 TOT赛道中发布,而人类诱发查询计划纳入TREC 2025 TOT赛道。此外,我们提供了合成查询生成的源代码、人类查询收集界面,以及用于诱发TOT查询的精选视觉刺激材料。