The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they employ a router to predict the routing for each token. However, the predictions are based solely on sample features and do not truly reveal the optimization direction of tokens. This can lead to severe optimization conflicts between different tokens within an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis. Specifically, we first use token-level gradients to identify conflicting tokens in experts. Then, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate the effectiveness of our method. The code will be publicly available at https://github.com/longrongyang/STGC.
翻译:专家混合(Mixture-of-Experts, MoE)在大型视觉语言模型(Large Vision-Language Models, LVLMs)的研究中日益受到关注。它采用稀疏模型替代密集模型,在推理时激活更少的参数即可达到相当的性能,从而显著降低推理成本。现有LVLMs中的MoE方法鼓励不同专家处理不同的令牌,因此它们使用路由器来预测每个令牌的路由。然而,这些预测仅基于样本特征,并未真正揭示令牌的优化方向。这可能导致同一专家内不同令牌之间出现严重的优化冲突。为解决此问题,本文提出一种基于令牌级梯度分析的新方法。具体而言,我们首先利用令牌级梯度识别专家中的冲突令牌。随后,我们引入一个专门设计的损失函数,旨在消除每个专家内令牌间的冲突。我们的方法可作为多种大型视觉语言模型的即插即用模块,大量实验结果验证了该方法的有效性。代码将在 https://github.com/longrongyang/STGC 公开。