Generative Artificial Intelligence (GenAI) is taking the world by storm. It promises transformative opportunities for advancing and disrupting existing practices, including healthcare. From large language models (LLMs) for clinical note synthesis and conversational assistance to multimodal systems that integrate medical imaging, electronic health records, and genomic data for decision support, GenAI is transforming the practice of medicine and the delivery of healthcare, such as diagnosis and personalized treatments, with great potential in reducing the cognitive burden on clinicians, thereby improving overall healthcare delivery. However, GenAI deployment in healthcare requires an in-depth understanding of healthcare tasks and what can and cannot be achieved. In this paper, we propose a data-centric paradigm in the design and deployment of GenAI systems for healthcare. Specifically, we reposition the data life cycle by making the medical data ecosystem as the foundational substrate for generative healthcare systems. This ecosystem is designed to sustainably support the integration, representation, and retrieval of diverse medical data and knowledge. With effective and efficient data processing pipelines, such as semantic vector search and contextual querying, it enables GenAI-powered operations for upstream model components and downstream clinical applications. Ultimately, it not only supplies foundation models with high-quality, multimodal data for large-scale pretraining and domain-specific fine-tuning, but also serves as a knowledge retrieval backend to support task-specific inference via the agentic layer. The ecosystem enables the deployment of GenAI for high-quality and effective healthcare delivery.
翻译:生成式人工智能(GenAI)正席卷全球,为包括医疗健康在内的现有实践带来了变革性机遇与颠覆潜力。从用于临床记录合成与会话辅助的大型语言模型(LLMs),到整合医学影像、电子健康记录和基因组数据以提供决策支持的多模态系统,GenAI正在改变医学实践与医疗服务的提供方式,例如在诊断和个性化治疗方面展现出巨大潜力,有望显著减轻临床医生的认知负担,从而提升整体医疗服务水平。然而,在医疗健康领域部署GenAI需要深入理解医疗任务及其可实现与不可实现的边界。本文提出一种以数据为中心的设计与部署范式,用于构建医疗健康领域的GenAI系统。具体而言,我们通过将医疗数据生态系统定位为生成式医疗系统的底层基础,重构了数据生命周期。该生态系统旨在可持续地支持多样化医疗数据和知识的整合、表示与检索。借助语义向量搜索、上下文查询等高效数据处理流程,该系统能够为上游模型组件和下游临床应用提供GenAI驱动的操作支持。最终,它不仅能为基础模型提供高质量、多模态数据以进行大规模预训练和领域特定微调,还可作为知识检索后端,通过智能体层支持任务特定推理。该生态系统为实现高质量、高效率的医疗服务提供了GenAI部署的基础。