Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at https://github.com/WangRongsheng/Aurora
翻译:现有研究表明,通过利用机器生成的指令遵循数据对大型语言模型(LLMs)进行精炼,可使这些模型在无需人工编写指令的情况下,展现出针对新任务的令人印象深刻的零样本能力。本文系统性研究、预处理并整合了三个中文指令遵循数据集,旨在增强Mixtral-8x7B稀疏混合专家模型的中文对话能力。通过对这一精心处理的数据集进行指令微调,我们成功构建了名为"Aurora"的Mixtral-8x7B稀疏混合专家模型。为评估Aurora的性能,我们采用了三项广泛认可的基准测试:C-Eval、MMLU和CMMLU。实证研究验证了指令微调对Mixtral-8x7B稀疏混合专家模型的有效性。本工作率先在稀疏专家混合模型上执行指令微调,标志着该模型架构能力提升方面的一项重大突破。我们的代码、数据和模型已公开于https://github.com/WangRongsheng/Aurora