Large Language Models (LLMs) have demonstrated remarkable capabilities on various tasks, while the further evolvement is limited to the lack of high-quality training data. In addition, traditional training approaches rely too much on expert-labeled data, setting an upper limit on the performance of LLMs. To address this issue, we propose a novel paradigm that enables LLMs to train itself by autonomously generating, cleaning, reviewing, and annotating data with preference information, named LANCE. Our approach demonstrates that LLMs can serve as continuous self-evolving data engineers, significantly reducing the time and cost of the post-training data construction process. Through iterative fine-tuning on different variants of the Qwen2, we validate the effectiveness of LANCE across various tasks, showing that it can continuously improve model performance and maintain high-quality data generation. Across eight benchmark dimensions, LANCE resulted in an average score enhancement of 3.36 for Qwen2-7B and 2.70 for Qwen2-7B-Instruct. This training paradigm with autonomous data construction not only reduces the reliance on human experts or external models but also ensures that the data aligns with human values and preferences, paving the way for the development of future superintelligent systems that can exceed human capabilities.
翻译:大型语言模型(LLM)已在多种任务上展现出卓越能力,但其进一步发展受限于高质量训练数据的匮乏。此外,传统训练方法过度依赖专家标注数据,这为LLM的性能设定了上限。为解决这一问题,我们提出了一种新颖范式,使LLM能够通过自主生成、清洗、审查并标注带有偏好信息的数据来训练自身,该范式被命名为LANCE。我们的方法表明,LLM可以作为持续自我演进的数据工程师,显著减少后训练数据构建过程的时间和成本。通过对Qwen2系列不同变体进行迭代微调,我们在多种任务上验证了LANCE的有效性,证明其能够持续提升模型性能并保持高质量的数据生成。在八个基准维度上,LANCE使Qwen2-7B的平均得分提升了3.36,使Qwen2-7B-Instruct的平均得分提升了2.70。这种具备自主数据构建能力的训练范式不仅降低了对人类专家或外部模型的依赖,还确保了数据与人类价值观及偏好的对齐,为开发超越人类能力的未来超级智能系统铺平了道路。