Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research

Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage presents new challenges not addressed in previous frameworks. In particular, the ability of these models to produce useful outputs with little to no specialized training data ("zero-" or "few-shot" approaches), as well as the open-ended nature of their outputs, necessitate the development of updated guidelines in using and evaluating these models. In response to gaps in standards and best practices for the development of clinical AI tools identified by US Executive Order 141103 and several emerging national networks for clinical AI evaluation, we begin to formalize some of these guidelines by building on the "Minimum information about clinical artificial intelligence modeling" (MI-CLAIM) checklist. The MI-CLAIM checklist, originally developed in 2020, provided a set of six steps with guidelines on the minimum information necessary to encourage transparent, reproducible research for artificial intelligence (AI) in medicine. Here, we propose modifications to the original checklist that highlight differences in training, evaluation, interpretability, and reproducibility of generative models compared to traditional AI models for clinical research. This updated checklist also seeks to clarify cohort selection reporting and adds additional items on alignment with ethical standards.

翻译：近期生成模型（包括大语言模型、视觉语言模型和扩散模型）的进展加速了医学自然语言与图像处理领域的发展，并标志着生物医学模型开发与部署方式的重大范式转变。尽管这些模型对新任务具有高度适应性，但其规模化应用与评估仍面临既有框架未涵盖的新挑战。具体而言，这些模型在无需或仅需少量专业训练数据即可生成有用输出（"零样本"或"少样本"方法）的能力，以及其输出的开放式特性，亟需制定更新版的使用与评估指南。针对美国第141103号行政令及多个新兴临床AI评估国家级网络所识别的临床AI工具开发标准与最佳实践缺口，我们以"临床人工智能建模最低信息标准"核查清单为基础，开始系统化制定部分指南。该核查清单最初于2020年制定，包含六个步骤的指南，旨在规范医学人工智能研究中促进透明、可复现研究所需的最低信息。本文对原始核查清单提出修订，重点阐明生成模型与传统临床研究AI模型在训练、评估、可解释性及可复现性方面的差异。更新后的核查清单同时完善了队列选择报告规范，并增加了伦理合规性相关条目。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日