Universal Medical Image Representation Learning with Compositional Decoders

Visual-language models have advanced the development of universal models, yet their application in medical imaging remains constrained by specific functional requirements and the limited data. Current general-purpose models are typically designed with task-specific branches and heads, which restricts the shared feature space and the flexibility of model. To address these challenges, we have developed a decomposed-composed universal medical imaging paradigm (UniMed) that supports tasks at all levels. To this end, we first propose a decomposed decoder that can predict two types of outputs -- pixel and semantic, based on a defined input queue. Additionally, we introduce a composed decoder that unifies the input and output spaces and standardizes task annotations across different levels into a discrete token format. The coupled design of these two components enables the model to flexibly combine tasks and mutual benefits. Moreover, our joint representation learning strategy skilfully leverages large amounts of unlabeled data and unsupervised loss, achieving efficient one-stage pretraining for more robust performance. Experimental results show that UniMed achieves state-of-the-art performance on eight datasets across all three tasks and exhibits strong zero-shot and 100-shot transferability. We will release the code and trained models upon the paper's acceptance.

翻译：视觉-语言模型推动了通用模型的发展，但其在医学影像领域的应用仍受限于特定的功能需求和有限的数据。当前的通用模型通常设计有任务特定的分支和头部，这限制了共享特征空间和模型的灵活性。为解决这些挑战，我们开发了一种分解-组合的通用医学影像范式（UniMed），支持所有层级的任务。为此，我们首先提出了一种分解解码器，它能够基于定义的输入队列预测两种类型的输出——像素级和语义级。此外，我们引入了一种组合解码器，它统一了输入和输出空间，并将不同层级的任务标注标准化为离散的令牌格式。这两个组件的耦合设计使模型能够灵活地组合任务并实现相互增益。此外，我们的联合表征学习策略巧妙地利用了大量的未标记数据和无监督损失，实现了高效的单阶段预训练，从而获得更稳健的性能。实验结果表明，UniMed在涵盖所有三项任务的八个数据集上均取得了最先进的性能，并展现出强大的零样本和100样本迁移能力。我们将在论文被接受后发布代码和训练好的模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日