Do Images Clarify? A Study on the Effect of Images on Clarifying Questions in Conversational Search

Conversational search systems increasingly employ clarifying questions to refine user queries and improve the search experience. Previous studies have demonstrated the usefulness of text-based clarifying questions in enhancing both retrieval performance and user experience. While images have been shown to improve retrieval performance in various contexts, their impact on user performance when incorporated into clarifying questions remains largely unexplored. We conduct a user study with 73 participants to investigate the role of images in conversational search, specifically examining their effects on two search-related tasks: (i) answering clarifying questions and (ii) query reformulation. We compare the effect of multimodal and text-only clarifying questions in both tasks within a conversational search context from various perspectives. Our findings reveal that while participants showed a strong preference for multimodal questions when answering clarifying questions, preferences were more balanced in the query reformulation task. The impact of images varied with both task type and user expertise. In answering clarifying questions, images helped maintain engagement across different expertise levels, while in query reformulation they led to more precise queries and improved retrieval performance. Interestingly, for clarifying question answering, text-only setups demonstrated better user performance as they provided more comprehensive textual information in the absence of images. These results provide valuable insights for designing effective multimodal conversational search systems, highlighting that the benefits of visual augmentation are task-dependent and should be strategically implemented based on the specific search context and user characteristics.

翻译：会话搜索系统越来越多地采用澄清问题来优化用户查询并提升搜索体验。先前研究已证明基于文本的澄清问题在提升检索性能和用户体验方面的有效性。尽管图像已被证实在多种情境下能够改善检索性能，但将其融入澄清问题后对用户表现的影响仍鲜有研究。我们通过73名参与者开展用户研究，探究图像在会话搜索中的作用，重点考察其对两项搜索相关任务的影响：(i) 回答澄清问题与(ii)查询重构。我们从多维度比较了会话搜索情境下多模态与纯文本澄清问题对这两类任务的影响。研究发现：参与者在回答澄清问题时表现出对多模态问题的强烈偏好，但在查询重构任务中偏好更为均衡。图像的影响随任务类型和用户专业水平呈现差异。在回答澄清问题时，图像有助于维持不同专业水平用户的参与度；而在查询重构中，图像能促成更精确的查询并提升检索性能。值得注意的是，对于澄清问题回答任务，纯文本设置因在没有图像时提供了更全面的文本信息，反而展现出更优的用户表现。这些结果为设计高效的多模态会话搜索系统提供了重要启示：视觉增强的效益具有任务依赖性，应根据具体搜索情境和用户特征进行策略性部署。