Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, most current methods complete the data collection and training steps in a single round, which may overlook the continuously improving ability of self-aligned models. This gives rise to a key query: What if we do multi-time bootstrapping self-alignment? Does this strategy enhance model performance or lead to rapid degradation? In this paper, our pioneering exploration delves into the impact of bootstrapping self-alignment on large language models. Our findings reveal that bootstrapping self-alignment markedly surpasses the single-round approach, by guaranteeing data diversity from in-context learning. To further exploit the capabilities of bootstrapping, we investigate and adjust the training order of data, which yields improved performance of the model. Drawing on these findings, we propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance. Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance. Our experiments demonstrate the efficiency of SOFT (SOFT+) across various classification and generation tasks, highlighting the potential of bootstrapping self-alignment on continually enhancing model alignment performance.
翻译:自对齐是一种在保证模型具备良好能力的同时,有效降低人工标注成本的方法。然而,当前大多数方法在单轮内完成数据收集和训练步骤,这可能忽视了自对齐模型持续提升的能力。这引出了一个关键问题:如果我们进行多轮自举式自对齐会怎样?这种策略是会提升模型性能,还是会导致性能迅速退化?在本文中,我们进行了开创性的探索,深入研究了自举式自对齐对大语言模型的影响。我们的研究结果表明,通过保证来自上下文学习的数据多样性,自举式自对齐显著超越了单轮方法。为了进一步挖掘自举的潜力,我们研究并调整了数据的训练顺序,从而提升了模型的性能。基于这些发现,我们提出了Step-On-Feet Tuning(SOFT),该方法利用模型持续增强的少样本能力来提升其零样本或单样本性能。基于从易到难的训练方案,我们进一步提出了SOFT+,以进一步提升自对齐的性能。我们的实验证明了SOFT(SOFT+)在各种分类和生成任务上的高效性,凸显了自举式自对齐在持续提升模型对齐性能方面的潜力。