Although pre-trained language models~(PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (\eg a model copy for a task) and inferior performance on complex mathematical problems in practical applications. To address these issues, in this paper, we propose \textbf{JiuZhang~2.0}, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the \emph{cross-task knowledge sharing} to improve the model capacity in a multi-task setting. Specially, we construct a Mixture-of-Experts~(MoE) architecture for modeling mathematical text, so as to capture the common mathematical knowledge across tasks. For optimizing the MoE architecture, we design \emph{multi-task continual pre-training} and \emph{multi-task fine-tuning} strategies for multi-task adaptation. These training strategies can effectively decompose the knowledge from the task data and establish the cross-task sharing via expert networks. In order to further improve the general capacity of solving different complex tasks, we leverage large language models~(LLMs) as complementary models to iteratively refine the generated solution by our PLM, via in-context learning. Extensive experiments have demonstrated the effectiveness of our model.
翻译:尽管预训练语言模型(PLMs)近期推动了数学推理研究的进展,但它们并非专门设计为高效的多任务求解器,存在多任务部署成本高(例如每个任务需独立模型副本)以及在复杂数学应用题上表现欠佳等问题。为解决上述挑战,本文提出**九章2.0**,一个专门用于多任务数学问题求解的统一中文预训练语言模型。我们的核心思路是保持中等规模模型,并通过**跨任务知识共享**提升多任务场景下的模型能力。具体而言,我们构建了面向数学文本建模的混合专家(MoE)架构,以捕获跨任务共享的通用数学知识。针对该MoE架构的优化,我们设计了**多任务持续预训练**与**多任务微调**策略用于多任务适配,这些训练策略可有效解构任务数据中的知识,并通过专家网络建立跨任务共享机制。为进一步提升求解不同复杂任务的通用能力,我们利用大语言模型(LLMs)作为辅助模型,通过上下文学习迭代优化预训练语言模型生成的解答。大量实验验证了本模型的有效性。