The advent of Large Language Models (LLMs) has ushered in a new era of artificial intelligence, with the potential to transform various sectors through automation and insightful analysis. The Mixture of Experts (MoE) architecture has been proposed as a solution to enhance model performance in complex tasks. Yet, existing MoE models struggle with task-specific learning and interpretability, especially in fields like medicine where precision is critical. This paper introduces the Adaptive Task-planing Mixture of Experts(AT-MoE), an innovative architecture designed to address these limitations. We first train task-specific experts via LoRA approach to enhance problem-solving capabilities and interpretability in specialized areas. Subsequently, we introduce a layer-wise adaptive grouped routing module that optimizes module fusion based on complex task instructions, ensuring optimal task resolution. The grouped routing module first perform overall weight allocation from the dimension of the expert group, and then conduct local weight normalization adjustments within the group. This design maintains multi-dimensional balance, controllability, and interpretability, while facilitating task-specific fusion in response to complex instructions.
翻译:大型语言模型(LLM)的出现开创了人工智能的新纪元,其通过自动化与深入分析变革各行业的潜力巨大。专家混合(MoE)架构被提出作为增强模型在复杂任务中性能的一种方案。然而,现有MoE模型在任务特定学习与可解释性方面存在不足,尤其在医学等精度要求极高的领域。本文提出自适应任务规划专家混合模型(AT-MoE),这是一种旨在解决上述局限性的创新架构。我们首先通过LoRA方法训练任务特定专家,以增强在专业领域的问题解决能力与可解释性。随后,我们引入一种分层自适应分组路由模块,该模块基于复杂任务指令优化模块融合,确保任务的最优解决。分组路由模块首先从专家组维度进行整体权重分配,随后在组内进行局部权重归一化调整。这一设计保持了多维度平衡性、可控性与可解释性,同时促进了针对复杂指令的任务特定融合。