MambaFormer：面向精确高效临床辅助的令牌级引导路由专家混合模型 (MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance)

The deployment of large language models (LLMs) in real-world clinical applications is constrained by the fundamental trade-off between computational cost and the efficiency of linear-time models. To address this, we propose an LLM-based MambaFormer hybrid Mixture-of-Experts (MoE) framework for efficient medical question-answering (QA) and clinical assistance. The MambaFormer employs a lightweight gating mechanism that performs token-level dynamic routing to a customized Transformer expert (ET5) for short, complex queries or to a State Space Model expert (EMamba) for long, high-throughput sequences. The customized EMamba and ET5 models are tailored to accommodate input sequence dimensionality, embedding structure, sequence length, and target-specific output heads, and are fine-tuned through transfer learning on a new, custom-designed DentalQA dataset. Moreover, intelligent routing decisions are driven by the contextual complexity of token embeddings, normalized sequence length, and domain-aware features, thereby enforcing a Pareto-optimal trade-off between inference latency and prediction accuracy. Furthermore, a novel utility-guided multi-objective loss jointly optimizes decisions, router parameters, routing behavior, expert utilization, and computational cost by adaptively regulating token-level expert activation. Finally, the proposed MambaFormer is cross-validated (holdout) for medical QA on the new, custom-designed DentalQA and PubMedQA datasets and compared with state-of-the-art techniques. The proposed MambaFormer outperforms (BERTScore = 0.9180) with ultra-low latency (0.077 s), delivering a 24.4 speedup over T5-Large and establishing a scalable solution for resource-constrained clinical deployment.

翻译：在现实世界临床应用中部署大型语言模型（LLM）受到计算成本与线性时间模型效率之间基本权衡的限制。为解决此问题，我们提出一种基于LLM的MambaFormer混合专家（MoE）框架，用于高效医疗问答（QA）与临床辅助。MambaFormer采用轻量级门控机制，对简短复杂查询执行令牌级动态路由至定制Transformer专家（ET5），或对长序列高吞吐量输入路由至状态空间模型专家（EMamba）。定制的EMamba与ET5模型经专门设计以适应输入序列维度、嵌入结构、序列长度及任务特定输出头，并通过迁移学习在新构建的定制DentalQA数据集上进行微调。此外，智能路由决策由令牌嵌入的上下文复杂度、归一化序列长度及领域感知特征驱动，从而在推理延迟与预测精度间实现帕累托最优权衡。进一步地，新颖的效用引导多目标损失函数通过自适应调节令牌级专家激活，联合优化决策、路由器参数、路由行为、专家利用率及计算成本。最后，所提出的MambaFormer在新构建的定制DentalQA与PubMedQA数据集上针对医疗QA任务进行交叉验证（留出法），并与前沿技术进行比较。实验表明，MambaFormer以超低延迟（0.077秒）获得优异性能（BERTScore = 0.9180），相较T5-Large实现24.4倍加速，为资源受限的临床部署提供了可扩展解决方案。