Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is specified in three components: 1) sampling-based difficulty level estimation, 2) difficulty-aware data augmentation, and 3) the self-training algorithm using SFT and DPO respectively. Experiments on mathematical tasks demonstrate the effectiveness and generalization of DAST, highlighting the critical role of difficulty-aware strategies in advancing LLM self-training.
翻译:当前大型语言模型(LLM)的自训练方法往往对困难查询进行欠采样,导致对难题的学习不足,从而限制了LLM的能力。为此,本研究提出了一种难度感知自训练(DAST)框架,该框架专注于在自训练过程中提升困难查询上自生成回答的数量与质量。DAST具体包含三个组成部分:1)基于采样的难度级别估计,2)难度感知数据增强,以及3)分别使用SFT和DPO的自训练算法。在数学任务上的实验证明了DAST的有效性和泛化能力,凸显了难度感知策略在推进LLM自训练中的关键作用。