We introduce Yuan3.0 Flash, an open-source Mixture-of-Experts (MoE) MultiModal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks while maintaining competitive capabilities on general-purpose tasks. To address the overthinking phenomenon commonly observed in Large Reasoning Models (LRMs), we propose Reflection-aware Adaptive Policy Optimization (RAPO), a novel RL training algorithm that effectively regulates overthinking behaviors. In enterprise-oriented tasks such as retrieval-augmented generation (RAG), complex table understanding, and summarization, Yuan3.0 Flash consistently achieves superior performance. Moreover, it also demonstrates strong reasoning capabilities in domains such as mathematics, science, etc., attaining accuracy comparable to frontier model while requiring only approximately 1/4 to 1/2 of the average tokens. Yuan3.0 Flash has been fully open-sourced to facilitate further research and real-world deployment: https://github.com/Yuan-lab-LLM/Yuan3.0.
翻译:本文介绍Yuan3.0 Flash,一个开源的专家混合模型架构多模态大语言模型,其激活参数为37亿,总参数量达400亿。该模型专为提升企业导向任务性能而设计,同时保持在通用任务上的竞争力。针对大型推理模型中常见的“过度思考”现象,我们提出了一种新颖的强化学习训练算法——反思感知自适应策略优化,该算法能有效调控过度思考行为。在企业导向任务中,如检索增强生成、复杂表格理解和文本摘要,Yuan3.0 Flash始终展现出卓越性能。此外,在数学、科学等领域,它也表现出强大的推理能力,在仅需约1/4至1/2平均令牌数的情况下,达到了与前沿模型相当的准确率。Yuan3.0 Flash已完全开源,以促进进一步研究和实际部署:https://github.com/Yuan-lab-LLM/Yuan3.0。