Image Retrieval aims to retrieve corresponding images based on a given query. In application scenarios, users intend to express their retrieval intent through various query styles. However, current retrieval tasks predominantly focus on text-query retrieval exploration, leading to limited retrieval query options and potential ambiguity or bias in user intention. In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles. To facilitate the novel setting, we propose the first Diverse-Style Retrieval dataset, encompassing diverse query styles including text, sketch, low-resolution, and art. We also propose a light-weighted style-diversified retrieval framework. For various query style inputs, we apply the Gram Matrix to extract the query's textural features and cluster them into a style space with style-specific bases. Then we employ the style-init prompt tuning module to enable the visual encoder to comprehend the texture and style information of the query. Experiments demonstrate that our model, employing the style-init prompt tuning strategy, outperforms existing retrieval models on the style-diversified retrieval task. Moreover, style-diversified queries~(sketch+text, art+text, etc) can be simultaneously retrieved in our model. The auxiliary information from other queries enhances the retrieval performance within the respective query.
翻译:图像检索旨在根据给定查询检索对应图像。在实际应用场景中,用户常通过多种查询风格表达其检索意图。然而,当前检索任务主要聚焦于文本查询检索的探索,导致检索查询选项有限,且用户意图可能存在歧义或偏差。本文提出风格多样化查询驱动的图像检索任务,支持基于多种查询风格进行检索。为推进这一新设定,我们构建了首个多样化风格检索数据集,涵盖文本、草图、低分辨率及艺术图等多种查询风格。同时提出轻量级风格多样化检索框架。针对不同查询风格输入,我们采用Gram矩阵提取查询的纹理特征,并将其聚类至包含风格专属基向量的风格空间中。随后,通过风格初始提示调优模块,使视觉编码器能够理解查询的纹理与风格信息。实验表明,采用风格初始提示调优策略的模型在风格多样化检索任务上优于现有检索模型。此外,该模型支持同时检索风格多样化查询(如草图+文本、艺术图+文本等),且其他查询提供的辅助信息能增强各自查询域内的检索性能。