The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token. However, the predictions are based solely on sample features and do not truly reveal the optimization directions of tokens. This may lead to severe optimization interference between different tokens assigned to an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC). Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.
翻译:专家混合(MoE)在大型视觉语言模型(LVLM)研究中日益受到关注。它采用稀疏模型替代密集模型,在推理时激活更少的参数,从而显著降低推理成本,同时保持相当的性能。现有LVLM中的MoE方法鼓励不同专家处理不同令牌,通常使用路由器预测每个令牌的路由。然而,这些预测仅基于样本特征,并未真正揭示令牌的优化方向,可能导致分配给同一专家的不同令牌之间产生严重的优化干扰。为解决该问题,本文提出一种基于令牌级梯度分析的新方法——解决令牌梯度冲突(STGC)。具体而言,我们首先利用令牌级梯度识别专家中的冲突令牌,随后引入专门设计的损失函数以消除各专家内部令牌间的冲突。本方法可作为多种大型视觉语言模型的即插即用模块,大量实验结果验证了其有效性。代码将在 https://github.com/longrongyang/STGC 公开。