Med-Banana: Learning Quality-Controlled Medical Image Editing from Success-and-Failure Trajectories

Text-guided medical image editing must satisfy the requested pathology while preserving anatomy, modality-specific appearance, and clinical plausibility. However, existing datasets largely supervise editors with final accepted edits and discard the failed attempts produced during generation. We argue that these failures provide essential supervision for quality control: they specify what should be rejected, why an edit is medically or visually invalid, and how the instruction should be revised. We present Med-Banana, a trajectory-supervised framework for quality-controlled medical image editing. We introduce Med-Banana-80K, a large-scale resource of success-and-failure editing trajectories with candidate images, verification outcomes, rejection reasons, and prompt refinements. Building on it, Med-Banana jointly trains an editor, verifier, and refiner, enabling edit--verify--refine inference from accepted and rejected attempts. Experiments across MLLM judges, blind expert assessment, source-preservation and real--synthetic separability probes demonstrate consistent improvements over open medical image editors. Code and data are publicly available.

翻译：文本引导的医学图像编辑需在满足指定病理要求的同时，保持解剖结构完整性、模态特异性外观及临床合理性。然而现有数据集主要利用最终被采纳的编辑结果训练编辑器，而丢弃了生成过程中产生的失败尝试。我们认为这些失败案例为质量控制提供了关键监督信号：它们明确指出了应被拒绝的内容、医学或视觉层面无效编辑的原因，以及指令应如何修正。本文提出Med-Banana，一个基于轨迹监督的质量控制医学图像编辑框架。我们构建了包含成功与失败编辑轨迹的大规模资源库Med-Banana-80K，涵盖候选图像、验证结果、拒绝原因及指令优化等信息。基于此资源，Med-Banana联合训练编辑器、验证器与优化器，实现从已采纳和已拒绝尝试中学习编辑-验证-优化推理。在多模态大语言模型评估、盲审专家评价、源域保持性检测以及真实-合成数据可分性探针等实验中，本方法相较于现有开源医学图像编辑器展现出持续性能提升。代码与数据均已公开。

相关内容

医学图像

关注 84

医学影像是指为了医疗或医学研究，对人体或人体某部分，以非侵入方式取得内部组织影像的技术与处理过程。它包含以下两个相对独立的研究方向：医学成像系统（medical imaging system）和医学图像处理（medical image processing）。前者是指图像行成的过程，包括对成像机理、成像设备、成像系统分析等问题的研究；后者是指对已经获得的图像作进一步的处理，其目的是或者是使原来不够清晰的图像复原，或者是为了突出图像中的某些特征信息，或者是对图像做模式分类等等。

Agent Banana: 基于智能体思维与工具调用的高保真图像编辑

专知会员服务

8+阅读 · 2月14日

Sora背后的技术，最新《可控生成与文本到图像扩散模型》综述

专知会员服务

69+阅读 · 2024年3月9日