AI-generated data contamination erodes pathological variability and diagnostic reliability

Hongyu He,Shaowen Xiang,Ye Zhang,Yingtao Zhu,Jin Zhang,Hao Deng,Emily Alsentzer,Qingyu Chen,Kun-Hsing Yu,Andrew Marshall,Tingting Chen,Srinivas Anumasa,Daniel Ebner,Dean Ho,Kee Yuan Ngiam,Ching-Yu Cheng,Dianbo Liu

from arxiv, *Corresponding author: Dianbo Liu ([email protected])

Generative artificial intelligence (AI) is rapidly populating medical records with synthetic content, creating a feedback loop where future models are increasingly at risk of training on uncurated AI-generated data. However, the clinical consequences of this AI-generated data contamination remain unexplored. Here, we show that in the absence of mandatory human verification, this self-referential cycle drives a rapid erosion of pathological variability and diagnostic reliability. By analysing more than 800,000 synthetic data points across clinical text generation, vision-language reporting, and medical image synthesis, we find that models progressively converge toward generic phenotypes regardless of the model architecture. Specifically, rare but critical findings, including pneumothorax and effusions, vanish from the synthetic content generated by AI models, while demographic representations skew heavily toward middle-aged male phenotypes. Crucially, this degradation is masked by false diagnostic confidence; models continue to issue reassuring reports while failing to detect life-threatening pathology, with false reassurance rates tripling to 40%. Blinded physician evaluation confirms that this decoupling of confidence and accuracy renders AI-generated documentation clinically useless after just two generations. We systematically evaluate three mitigation strategies, finding that while synthetic volume scaling fails to prevent collapse, mixing real data with quality-aware filtering effectively preserves diversity. Ultimately, our results suggest that without policy-mandated human oversight, the deployment of generative AI threatens to degrade the very healthcare data ecosystems it relies upon.

翻译：生成式人工智能正迅速将合成内容引入医疗记录，由此形成的反馈循环使得未来模型面临在未经筛选的AI生成数据上训练的风险日益增加。然而，这种AI生成数据污染的临床后果尚未得到充分探究。本文研究表明，在缺乏强制性人工验证的情况下，这种自我指涉的循环会迅速导致病理学变异性和诊断可靠性的衰减。通过分析临床文本生成、视觉语言报告和医学图像合成三大领域超过80万个合成数据点，我们发现无论模型架构如何，模型输出均逐渐趋同于通用表型。具体而言，包括气胸和积液在内的罕见但关键临床表现在AI模型生成的合成内容中消失，而人口统计学表征则严重偏向中年男性表型。至关重要的是，这种性能退化被虚假的诊断置信度所掩盖：模型持续生成看似可靠的报告，却未能检测出危及生命的病理特征，其虚假安抚率攀升至40%，增长达三倍。盲法医师评估证实，置信度与准确性的这种脱节使得AI生成的医疗文档仅经过两代迭代后即丧失临床价值。我们系统评估了三种缓解策略，发现虽然单纯扩大合成数据规模无法避免系统崩溃，但将真实数据与质量感知过滤相结合能有效保持多样性。最终，我们的研究结果表明，若缺乏政策强制要求的人工监督，生成式人工智能的部署将可能损害其赖以生存的医疗数据生态系统本身。