AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach

The advent of Large Language Models (LLMs) has ushered in a new era of artificial intelligence, with the potential to transform various sectors through automation and insightful analysis. The Mixture of Experts (MoE) architecture has been proposed as a solution to enhance model performance in complex tasks. Yet, existing MoE models struggle with task-specific learning and interpretability, especially in fields like medicine where precision is critical. This paper introduces the Adaptive Task-planing Mixture of Experts(AT-MoE), an innovative architecture designed to address these limitations. We first train task-specific experts via LoRA approach to enhance problem-solving capabilities and interpretability in specialized areas. Subsequently, we introduce a layer-wise adaptive grouped routing module that optimizes module fusion based on complex task instructions, ensuring optimal task resolution. The grouped routing module first perform overall weight allocation from the dimension of the expert group, and then conduct local weight normalization adjustments within the group. This design maintains multi-dimensional balance, controllability, and interpretability, while facilitating task-specific fusion in response to complex instructions.

翻译：大型语言模型（LLM）的出现开创了人工智能的新纪元，其通过自动化与深入分析变革各行业的潜力巨大。专家混合（MoE）架构被提出作为增强模型在复杂任务中性能的一种方案。然而，现有MoE模型在任务特定学习与可解释性方面存在不足，尤其在医学等精度要求极高的领域。本文提出自适应任务规划专家混合模型（AT-MoE），这是一种旨在解决上述局限性的创新架构。我们首先通过LoRA方法训练任务特定专家，以增强在专业领域的问题解决能力与可解释性。随后，我们引入一种分层自适应分组路由模块，该模块基于复杂任务指令优化模块融合，确保任务的最优解决。分组路由模块首先从专家组维度进行整体权重分配，随后在组内进行局部权重归一化调整。这一设计保持了多维度平衡性、可控性与可解释性，同时促进了针对复杂指令的任务特定融合。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日