Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval

Query suggestion, a technique widely adopted in information retrieval, enhances system interactivity and the browsing experience of document collections. In cross-modal retrieval, many works have focused on retrieving relevant items from natural language queries, while few have explored query suggestion solutions. In this work, we address query suggestion in cross-modal retrieval, introducing a novel task that focuses on suggesting minimal textual modifications needed to explore visually consistent subsets of the collection, following the premise of ''Maybe you are looking for''. To facilitate the evaluation and development of methods, we present a tailored benchmark named CroQS. This dataset comprises initial queries, grouped result sets, and human-defined suggested queries for each group. We establish dedicated metrics to rigorously evaluate the performance of various methods on this task, measuring representativeness, cluster specificity, and similarity of the suggested queries to the original ones. Baseline methods from related fields, such as image captioning and content summarization, are adapted for this task to provide reference performance scores. Although relatively far from human performance, our experiments reveal that both LLM-based and captioning-based methods achieve competitive results on CroQS, improving the recall on cluster specificity by more than 115% and representativeness mAP by more than 52% with respect to the initial query. The dataset, the implementation of the baseline methods and the notebooks containing our experiments are available here: https://paciosoft.com/CroQS-benchmark/

翻译：查询建议作为信息检索领域广泛采用的技术，能够增强系统交互性并优化文档集合的浏览体验。在跨模态检索中，现有研究多聚焦于通过自然语言查询检索相关项目，而对查询建议解决方案的探索相对有限。本研究针对跨模态检索中的查询建议问题，提出一项创新任务：基于“或许您正在寻找”的前提，探索如何通过最小化文本修改来建议能够检索视觉一致性集合子集的查询方案。为促进该任务的评估与方法开发，我们构建了专用基准数据集CroQS。该数据集包含初始查询、分组结果集以及针对每组结果人工定义的推荐查询。我们建立了专项评估指标，从建议查询的代表性、聚类特异性及与原查询相似性三个维度，系统评估各类方法在此任务上的性能。通过适配图像描述生成与内容摘要等相关领域的基线方法，我们提供了基准性能参考。实验表明，尽管与人类性能存在差距，基于大语言模型和图像描述的方法在CroQS基准上均取得竞争性结果：相较于初始查询，在聚类特异性召回率方面提升超过115%，在代表性平均精度均值方面提升超过52%。数据集、基线方法实现及实验代码已开源：https://paciosoft.com/CroQS-benchmark/

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日