Authority Signals in AI Cited Health Sources: A Framework for Evaluating Source Credibility in ChatGPT Responses

Health information seeking has fundamentally changed since the onset of Large Language Models (LLM), with nearly one third of ChatGPT's 800 million users asking health questions weekly. Understanding the sources of those AI generated responses is vital, as health organizations and providers are also investing in digital strategies to organically improve their ranking, reach and visibility in LLM systems like ChatGPT. As AI search optimization strategies are gaining maturity, this study introduces an Authority Signals Framework, organized in four domains that reflect key components to health information seeking, starting with "Who wrote it?" (Author Credentials), followed by "Who published it?" (Institutional Affiliation), "How was it vetted?" (Quality Assurance), and "How does AI find it?" (Digital Authority). This descriptive cross-sectional study randomly selected 100 questions from HealthSearchQA which contains 3,173 consumer health questions curated by Google Research from publicly available search engine suggestions. Those questions were entered into ChatGPT 5.2 Pro to record and code the cited sources through the lens of the Authority Signals Framework's four domains. Descriptive statistics were calculated for all cited sources (n=615), and cross tabulations were conducted to examine distinction among organization types. Over 75% of the sources cited in ChatGPT's health generated responses were from established institutional sources, such as Mayo Clinic, Cleveland Clinic, Wikipedia, National Health Service, PubMed with the remaining citations sourced from alternative health information sources that lacked established institutional backing.

翻译：自大型语言模型（LLM）兴起以来，健康信息检索方式已发生根本性变革——在ChatGPT的8亿用户中，近三分之一每周都会咨询健康问题。由于健康机构和医疗服务提供者同样在投资数字策略，以有机提升其在ChatGPT等LLM系统中的排名、覆盖范围和可见性，理解这些AI生成回答的源头至关重要。随着AI搜索优化策略日趋成熟，本研究提出一个包含四大维度的权威信号框架，这些维度对应健康信息检索的关键要素：始于“谁撰写的？”（作者资质），延伸至“谁发布的？”（机构归属）、“如何审核的？”（质量保证）以及“AI如何发现它？”（数字权威）。这项描述性横断面研究从HealthSearchQA中随机抽取100个问题（该数据集包含谷歌研究院从公开搜索引擎建议中整理的3,173个消费者健康问题），将其输入ChatGPT 5.2 Pro版本，依据权威信号框架的四个维度对引用来源进行记录与编码。研究对所有引用来源（n=615）进行描述性统计，并通过交叉表分析检验机构类型间的差异。结果显示，ChatGPT生成的健康回答中超过75%的引用来源来自权威机构，如梅奥诊所、克利夫兰诊所、维基百科、英国国家医疗服务体系、PubMed，其余引用则来自缺乏权威机构背书的替代性健康信息源。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

如何构建媲美ChatGPT的开源大模型？南洋理工等最新《开源大型语言模型》综述，最佳开源LLM配方

专知会员服务

75+阅读 · 2023年11月29日

《利用 ChatGPT 实现高效事实核查》

专知会员服务

48+阅读 · 2023年10月25日

如何检测ChatGPT？TUM最新《检测ChatGPT生成文本现状》综述

专知会员服务

41+阅读 · 2023年9月17日

大模型ChatGPT如何用于知识图谱构建？《利用大型语言模型增强知识图谱构建》论文

专知会员服务

218+阅读 · 2023年5月9日