UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus pixel-level reconstruction. Existing approaches, typically based on parameter-shared autoregressive architectures, frequently lead to compromised performance in one or both tasks. To address this, we present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation. UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. Crucially, a cross-modal self-attention mechanism is introduced to dynamically guide the generation process with understanding features. Coupled with a rigorous data cleaning pipeline and a multi-stage training strategy, this architecture enables synergistic collaboration between tasks while leveraging the strengths of diffusion models for superior generation. On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance (Micro-F1) and a 24.2% gain in generation quality (FD-RadDino), using only a quarter of the parameters of LLM-CXR. By achieving performance on par with task-specific models, our work establishes a scalable paradigm for synergistic medical image understanding and generation. Codes and models are available at https://github.com/ZrH42/UniX.

翻译：尽管近期取得进展，医疗基础模型在统一视觉理解与生成任务方面仍面临挑战，因为这两类任务本质上存在目标冲突：语义抽象与像素级重建。现有方法通常基于参数共享的自回归架构，往往导致一项或两项任务的性能受损。为此，我们提出UniX——新一代用于胸部X光图像理解与生成的统一医疗基础模型。UniX将两项任务解耦为理解任务的自回归分支与高保真生成任务的扩散分支。关键创新在于引入跨模态自注意力机制，通过理解特征动态引导生成过程。结合严格的数据清洗流程与多阶段训练策略，该架构实现了任务间的协同合作，同时充分发挥扩散模型在生成质量上的优势。在两个代表性基准测试中，UniX仅使用LLM-CXR四分之一参数量的情况下，在理解性能（Micro-F1）上提升46.1%，在生成质量（FD-RadDino）上提升24.2%。通过与任务专用模型相媲美的性能表现，本研究为协同医疗图像理解与生成建立了可扩展的范式。代码与模型发布于https://github.com/ZrH42/UniX。

相关内容

Unix

关注 1

UNIX操作系统（UNIX），是美国AT&T公司1971年在PDP-11上运行的操作系统。具有多用户、多任务的特点，支持多种处理器架构，最早由肯·汤普逊（Kenneth Lane Thompson）、丹尼斯·里奇（Dennis MacAlistair Ritchie）和道格拉斯·麦克罗伊于1969年在AT&T的贝尔实验室开发。现为 Linux、BSD 和 Solaris 等多种符合 POSIX 标准的「Unix 类操作系统」的统称。

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

专知会员服务

25+阅读 · 2025年12月27日

视觉如何模型统一？牛津大学Shuyang Sun博士论文《迈向统一视觉感知》全面阐述

专知会员服务

47+阅读 · 2024年8月11日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR2023】基于动态图增强对比学习的胸部X光报告生成

专知会员服务

21+阅读 · 2023年3月23日