RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI

Large language models (LLMs) show promise in radiology but their deployment is limited by computational requirements that preclude use in resource-constrained clinical environments. We investigate whether small language models (SLMs) of 3-4 billion parameters can achieve strong multi-task radiology performance through LoRA fine-tuning, enabling deployment on consumer-grade CPUs. We train Qwen2.5-3B-Instruct and Qwen3-4B on 162K samples spanning 9 radiology tasks - RADS classification across 10 systems, impression generation, temporal comparison, radiology NLI, NER, abnormality detection, N/M staging, and radiology Q&A - compiled from 12 public datasets. Both models are evaluated on up to 500 held-out test samples per task with standardized metrics. Our key findings are: (1) LoRA fine-tuning dramatically improves performance over zero-shot baselines (RADS accuracy +53%, NLI +60%, N-staging +89%); (2) the two models exhibit complementary strengths - Qwen2.5 excels at structured generation tasks while Qwen3 dominates extractive tasks; (3) a task-outed oracle ensemble combining both models achieves the best performance across all tasks; (4) few-shot prompting with fine-tuned models hurts performance, demonstrating that LoRA adaptation is more effective than in-context learning for specialized domains; and (5) models can be quantized to GGUF format (~1.8-2.4GB) for CPU deployment at 4-8 tokens/second on consumer hardware. Our work demonstrates that small, efficiently fine-tuned models - which we collectively call RadLite - can serve as practical multi-task radiology AI assistants deployable entirely on consumer hardware without GPU requirements. Code and models are available at https://github.com/RadioX-Labs/RadLite

翻译：大型语言模型（LLMs）在放射学领域展现出潜力，但其计算需求限制了资源受限临床环境中的部署。本研究探究参数量为30-40亿的小语言模型（SLMs）能否通过LoRA微调实现强大的多任务放射学性能，从而支持在消费级CPU上部署。我们在涵盖9项放射学任务的162K样本上训练了Qwen2.5-3B-Instruct与Qwen3-4B模型——包括10个系统的RADS分类、印象生成、时序比较、放射学自然语言推理（NLI）、命名实体识别（NER）、异常检测、N/M分期及放射学问答——这些数据整合自12个公开数据集。两类模型均通过标准化指标在每项任务多达500个保留测试样本上评估。关键发现如下：（1）相较于零样本基线，LoRA微调显著提升性能（RADS准确率+53%，NLI+60%，N分期+89%）；（2）两类模型呈现互补优势：Qwen2.5擅长结构化生成任务，而Qwen3主导抽取式任务；（3）结合两类模型的任务级专家集成在所有任务中取得最优性能；（4）对微调模型使用少样本提示会损害性能，表明LoRA适配在专业领域比上下文学习更有效；（5）模型可量化至GGUF格式（约1.8-2.4GB），在消费级硬件上以4-8 token/秒的速率实现CPU部署。本研究表明，经过高效微调的小型模型（统称为RadLite）可作为实用型多任务放射学AI助手，完全部署于无需GPU的消费级硬件。代码与模型已开源至https://github.com/RadioX-Labs/RadLite