Large foundation models (LFMs) transform healthcare AI in prevention, diagnostics, and treatment. However, whether LFMs can provide truly personalized treatment recommendations remains an open question. Recent research has revealed multiple challenges for personalization, including the fundamental generalizability paradox: models achieving high accuracy in one clinical study perform at chance level in others, demonstrating that personalization and external validity exist in tension. This exemplifies broader contradictions in AI-driven healthcare: the privacy-performance paradox, scale-specificity paradox, and the automation-empathy paradox. As another challenge, the degree of causal understanding required for personalized recommendations, as opposed to mere predictive capacities of LFMs, remains an open question. N-of-1 trials -- crossover self-experiments and the gold standard for individual causal inference in personalized medicine -- resolve these tensions by providing within-person causal evidence while preserving privacy through local experimentation. Despite their impressive capabilities, this paper argues that LFMs cannot replace N-of-1 trials. We argue that LFMs and N-of-1 trials are complementary: LFMs excel at rapid hypothesis generation from population patterns using multimodal data, while N-of-1 trials excel at causal validation for a given individual. We propose a hybrid framework that combines the strengths of both to enable personalization and navigate the identified paradoxes: LFMs generate ranked intervention candidates with uncertainty estimates, which trigger subsequent N-of-1 trials. Clarifying the boundary between prediction and causation and explicitly addressing the paradoxical tensions are essential for responsible AI integration in personalized medicine.
翻译:大型基础模型(LFMs)正在变革预防、诊断和治疗领域的医疗人工智能。然而,LFMs能否提供真正个性化的治疗建议仍是一个悬而未决的问题。近期研究揭示了实现个性化面临的诸多挑战,包括根本性的泛化悖论:在某一临床研究中达到高准确率的模型,在其他研究中仅表现出随机水平的表现,这表明个性化与外部效度之间存在张力。这体现了人工智能驱动医疗中更广泛的矛盾:隐私-性能悖论、规模-特异性悖论以及自动化-共情悖论。另一项挑战在于,个性化推荐所需的因果理解程度(与LFMs单纯的预测能力相对)仍然未有定论。N-of-1试验——交叉自我实验及个性化医疗中个体因果推断的金标准——通过提供个体内因果证据,同时借助本地化实验保护隐私,有效化解了这些矛盾。尽管LFMs能力卓越,本文论证其无法取代N-of-1试验。我们认为LFMs与N-of-1试验具有互补性:LFMs擅长利用多模态数据从群体模式中快速生成假设,而N-of-1试验则精于对特定个体进行因果验证。我们提出一个融合两者优势的混合框架,以实现个性化并化解已识别的悖论:LFMs生成带有不确定性估计的干预候选方案排序,进而触发后续N-of-1试验。明确预测与因果的界限,并直面这些矛盾张力,对于负责任地将人工智能整合到个性化医疗中至关重要。