Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3,410 CT volumes assembled from 14 publicly available datasets and then test it on 6,173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6x faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model

翻译：人工智能在器官分割与肿瘤检测领域的进步得益于带有精细体素级标注的计算机断层扫描数据集日益增多。然而，由于独热编码、架构设计和学习策略的局限性，这些人工智能模型在处理部分标注数据集时往往缺乏灵活性，在扩展到新类别时也面临困难。为克服这些限制，我们提出了一种通用、可扩展的框架，使得单个模型（称为通用模型）能够处理多个公共数据集并适应新类别（如器官/肿瘤）。首先，我们引入了一种新颖的语言驱动参数生成器，它利用大型语言模型生成的语言嵌入，相比独热编码能更丰富地进行语义编码。其次，我们将传统的输出层替换为轻量级的类别特定头部，使通用模型能够同时分割25个器官和六类肿瘤，并便于新增类别。我们在来自14个公开数据集的3,410个CT扫描体积上训练通用模型，随后在四个外部数据集的6,173个CT扫描体积上测试。通用模型在医学分割十项全能公开排行榜的六项CT任务中位列第一，并在Beyond The Cranial Vault数据集上取得领先性能。总之，通用模型展现出卓越的计算效率（比其他针对特定数据集的模型快6倍），在不同医院数据上表现出强大的泛化能力，能良好迁移到众多下游任务，更重要的是，它在促进新类别扩展的同时，缓解了对已学习类别的灾难性遗忘。代码、模型和数据集可在https://github.com/ljwztc/CLIP-Driven-Universal-Model获取。

相关内容