AI-generated data contamination erodes pathological variability and diagnostic reliability

Hongyu He,Shaowen Xiang,Ye Zhang,Yingtao Zhu,Jin Zhang,Hao Deng,Emily Alsentzer,Qingyu Chen,Kun-Hsing Yu,Andrew Marmenshall,Tingting Chen,Srinivas Anumasa,Daniel Ebner,Dean Ho,Kee Yuan Ngiam,Ching-Yu Cheng,Dianbo Liu

from arxiv, *Corresponding author: Dianbo Liu ([email protected])

Generative artificial intelligence (AI) is rapidly populating medical records with synthetic content, creating a feedback loop where future models are increasingly at risk of training on uncurated AI-generated data. However, the clinical consequences of this AI-generated data contamination remain unexplored. Here, we show that in the absence of mandatory human verification, this self-referential cycle drives a rapid erosion of pathological variability and diagnostic reliability. By analysing more than 800,000 synthetic data points across clinical text generation, vision-language reporting, and medical image synthesis, we find that models progressively converge toward generic phenotypes regardless of the model architecture. Specifically, rare but critical findings, including pneumothorax and effusions, vanish from the synthetic content generated by AI models, while demographic representations skew heavily toward middle-aged male phenotypes. Crucially, this degradation is masked by false diagnostic confidence; models continue to issue reassuring reports while failing to detect life-threatening pathology, with false reassurance rates tripling to 40%. Blinded physician evaluation confirms that this decoupling of confidence and accuracy renders AI-generated documentation clinically useless after just two generations. We systematically evaluate three mitigation strategies, finding that while synthetic volume scaling fails to prevent collapse, mixing real data with quality-aware filtering effectively preserves diversity. Ultimately, our results suggest that without policy-mandated human oversight, the deployment of generative AI threatens to degrade the very healthcare data ecosystems it relies upon.

翻译：生成式人工智能正迅速将合成内容引入医疗记录，由此产生的反馈循环使未来模型面临在未经筛选的人工智能生成数据上训练的风险日益增加。然而，这种人工智能生成数据污染的临床后果仍未得到充分探究。本研究证明，在缺乏强制性人工验证的情况下，这种自我指涉循环会快速侵蚀病理变异性和诊断可靠性。通过分析临床文本生成、视觉语言报告和医学图像合成领域的超过80万个合成数据点，我们发现无论模型架构如何，模型都会逐渐收敛至通用表型。具体而言，包括气胸和积液在内的罕见但关键征象从人工智能模型生成的合成内容中消失，而人口统计学表征则严重偏向中年男性表型。至关重要的是，这种性能退化被虚假的诊断置信度所掩盖：模型持续生成看似可靠的报告，却未能检测危及生命的病理变化，虚假安抚率攀升至40%，增长达三倍。盲法医师评估证实，置信度与准确性的脱节使得人工智能生成的医疗文档仅经过两代迭代即丧失临床价值。我们系统评估了三种缓解策略，发现虽然单纯扩大合成数据规模无法防止系统崩溃，但将真实数据与质量感知过滤相结合能有效保持多样性。最终，我们的研究结果表明，若缺乏政策强制要求的人工监督，生成式人工智能的部署可能损害其赖以生存的医疗数据生态系统。