Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignment methods and the continuous automatic alignment of humans. In this paper, we introduce \textbf{I-SHEEP}, an \textbf{I}terative \textbf{S}elf-En\textbf{H}anc\textbf{E}m\textbf{E}nt \textbf{P}aradigm.This human-like paradigm enables LLMs to \textbf{continuously self-align from scratch with nothing}. Compared to the one-time alignment method Dromedary \cite{sun2023principledriven}, which refers to the first iteration in this paper, I-SHEEP can significantly enhance capacities on both Qwen and Llama models. I-SHEEP achieves a maximum relative improvement of 78.2\% in the Alpaca Eval, 24.0\% in the MT Bench, and an absolute increase of 8.88\% in the IFEval accuracy over subsequent iterations in Qwen-1.5 72B model. Additionally, I-SHEEP surpasses the base model in various standard benchmark generation tasks, achieving an average improvement of 24.77\% in code generation tasks, 12.04\% in TrivialQA, and 20.29\% in SQuAD. We also provide new insights based on the experiment results. Our codes, datasets, and models are available at \textbf{https://anonymous.4open.science/r/I-SHEEP}.
翻译:大语言模型(LLM)已取得显著进展,然而主流学习范式通常将LLM视为被动信息存储库,忽视了其主动学习与对齐的潜力。部分研究尝试利用模型自身生成的合成数据训练LLM,探索主动对齐的可能性。但这类一次性对齐方法与人类持续自动对齐机制之间仍存在巨大差距。本文提出\textbf{I-SHEEP}——一种\textbf{迭代式自我增强范式}。这种类人范式使LLM能够\textbf{从零开始持续进行自我对齐}。相较于一次性对齐方法Dromedary \cite{sun2023principledriven}(即本文中的首次迭代),I-SHEEP在Qwen和Llama模型上均能显著提升能力。在Qwen-1.5 72B模型的后续迭代中,I-SHEEP在Alpaca Eval上取得最高78.2\%的相对提升,在MT Bench上提升24.0\%,在IFEval准确率上实现8.88\%的绝对增长。此外,I-SHEEP在各类标准基准生成任务中均超越基础模型,其中代码生成任务平均提升24.77\%,TrivialQA提升12.04\%,SQuAD提升20.29\%。基于实验结果,我们提出了新的见解。相关代码、数据集及模型已发布于\textbf{https://anonymous.4open.science/r/I-SHEEP}。