The growing accessibility of Large Language Models via conversational interfaces capable of responding to users' questions by drawing on, synthesizing, and citing information from the web (i.e., Generative Search Engines) has simplified the information-seeking process for users. However, with the proliferation of AI-generated content on the web, it is unclear whether these engines can reliably omit citing synthetic sources (i.e., AI-generated sources). Should these engines be unable to do so, this puts users at risk of harm by treating information from AI-generated sources synthesized in responses of generative search engines as equivalent to information from authoritative or official sources. In a step towards identifying whether AI-generated sources are being cited by these engines, this work presents an audit of four generative search engines (ChatGPT, Copilot, Gemini, Perplexity) using a total of 712 real-world human-generated queries spanning domains of public importance: politics, health, and the environment. Our findings show evidence of AI-generated sources being cited across all four generative search engines (~16% of cited sources) and identifies key source web domains these sources belong to that are frequently cited across these engines and topics. In addition, we observed that generative search engines include a somewhat narrow set of repeatedly cited domains while predominantly surfacing a large number of minimally cited domains in responses to users' queries. These findings contribute to the growing body of work on assessing the risks of generative search engines with the objective of increasing public awareness of their limitations and encouraging appropriate measures to improve information quality and governance of these systems.
翻译:随着能够通过整合、综合并引用网络信息来响应用户问题的对话式界面(即生成式搜索引擎)日益普及,大型语言模型的广泛可访问性简化了用户的信息检索过程。然而,随着网络上人工智能生成内容的激增,这些引擎是否能够可靠地避免引用合成来源(即人工智能生成来源)尚不明确。若这些引擎无法做到这一点,用户将面临风险:生成式搜索引擎将人工智能生成来源的信息视为与权威或官方来源同等,从而导致危害。为探究人工智能生成来源是否被这些引擎引用,本研究对四种生成式搜索引擎(ChatGPT、Copilot、Gemini、Perplexity)进行了审计,使用了涵盖政治、健康和环境等公共重要领域的712个真实人类生成的查询。我们的发现显示,所有四种生成式搜索引擎均存在引用人工智能生成来源的证据(约占被引来源的16%),并识别出这些来源所属的关键网络域,这些域在上述引擎和主题中被频繁引用。此外,我们观察到生成式搜索引擎引用了一组相对狭窄的重复引用域,同时在响应用户查询时主要呈现大量仅被少量引用的域。这些发现为评估生成式搜索引擎风险的研究提供了新证据,旨在提升公众对其局限性的认知,并鼓励采取适当措施以改善信息质量及治理这些系统。