Image Retrieval aims to retrieve corresponding images based on a given query. In application scenarios, users intend to express their retrieval intent through various query styles. However, current retrieval tasks predominantly focus on text-query retrieval exploration, leading to limited retrieval query options and potential ambiguity or bias in user intention. In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles. To facilitate the novel setting, we propose the first Diverse-Style Retrieval dataset, encompassing diverse query styles including text, sketch, low-resolution, and art. We also propose a light-weighted style-diversified retrieval framework. For various query style inputs, we apply the Gram Matrix to extract the query's textural features and cluster them into a style space with style-specific bases. Then we employ the style-init prompt tuning module to enable the visual encoder to comprehend the texture and style information of the query. Experiments demonstrate that our model, employing the style-init prompt tuning strategy, outperforms existing retrieval models on the style-diversified retrieval task. Moreover, style-diversified queries~(sketch+text, art+text, etc) can be simultaneously retrieved in our model. The auxiliary information from other queries enhances the retrieval performance within the respective query.
翻译:图像检索旨在根据给定查询检索对应图像。在实际应用场景中,用户常通过多种查询风格表达检索意图。然而,当前检索任务主要集中于基于文本的查询探索,导致检索查询选项受限,且用户意图可能存在歧义或偏差。本文提出多样化风格查询驱动的图像检索任务,支持基于多种查询风格的检索。为促进这一新设定,我们构建了首个多样化风格检索数据集,涵盖文本、草图、低分辨率图像及艺术作品等多种查询风格。同时提出轻量级多样化风格检索框架。针对不同风格的查询输入,采用格拉姆矩阵提取查询的纹理特征,并通过风格特定基将其聚类至风格空间。进而利用基于风格初始化的提示调优模块,使视觉编码器理解查询的纹理与风格信息。实验表明,采用风格初始化提示调优策略的模型,在多样化风格检索任务中性能优于现有检索模型。此外,本模型可同时处理多样化风格查询(如草图+文本、艺术+文本等),多查询辅助信息能提升各单一查询的检索性能。