The Impact of Snippet Reliability on Misinformation in Online Health Search

Search result snippets are crucial in modern search engines, providing users with a quick overview of a website's content. Snippets help users determine the relevance of a document to their information needs, and in certain scenarios even enable them to satisfy those needs without visiting web documents. Hence, it is crucial for snippets to reliably represent the content of their corresponding documents. While this may be a straightforward requirement for some queries, it can become challenging in the complex domain of healthcare, and can lead to misinformation. This paper aims to examine snippets' reliability in representing their corresponding documents, specifically in the health domain. To achieve this, we conduct a series of user studies using Google's search results, where participants are asked to infer viewpoints of search results pertaining to queries about the effectiveness of a medical intervention for a medical condition, based solely on their titles and snippets. Our findings reveal that a considerable portion of Google's snippets (28%) failed to present any viewpoint on the intervention's effectiveness, and that 35% were interpreted by participants as having a different viewpoint compared to their corresponding documents. To address this issue, we propose a snippet extraction solution tailored directly to users' information needs, i.e., extracting snippets that summarize documents' viewpoints regarding the intervention and condition that appear in the query. User study demonstrates that our information need-focused solution outperforms the mainstream query-based approach. With only 19.67% of snippets generated by our solution reported as not presenting a viewpoint and a mere 20.33% misinterpreted by participants. These results strongly suggest that an information need-focused approach can significantly improve the reliability of extracted snippets in online health search.

翻译：搜索摘要片段是现代搜索引擎中的关键组件，旨在为用户提供网页内容的快速概览。这些片段帮助用户判断文档与其信息需求的相关性，在某些场景下甚至无需访问网页即可满足其需求。因此，确保片段能可靠地反映对应文档内容至关重要。尽管这一要求对某些查询而言相对直接，但在复杂的医疗健康领域却可能面临挑战，甚至引发错误信息。本文旨在研究片段在医疗领域中对文档内容的代表性可靠性。为此，我们基于谷歌搜索结果开展了一系列用户研究：参与者仅依据标题和片段，推断关于某项医疗干预措施对特定疾病有效性的查询结果立场。研究发现：谷歌搜索中28%的片段未呈现任何关于干预措施有效性的立场，且35%的片段被参与者解读为与对应文档观点存在差异。针对该问题，我们提出了一种直接面向用户信息需求的片段提取方案——即提取汇总查询中涉及的干预措施与疾病的文档观点片段。用户研究表明，我们以信息需求为中心的方案优于主流的基于查询的方法：仅19.67%的片段被报告未呈现立场，仅20.33%被参与者错误解读。这些结果有力表明，以信息需求为导向的方法能够显著提升在线健康搜索中提取片段的可靠性。