Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never updated on downstream task labels; they are used only to shape the pretraining task. Under matched learner update budgets, V-Pretraining of 0.5B--7B language models improves reasoning (GSM8K test Pass@1) by up to 18% relative over standard next-token prediction using only 12% of GSM8K training examples as feedback. In vision SSL, we improve the state-of-the-art results on ADE20K by up to 1.07 mIoU and reduce NYUv2 RMSE while improving ImageNet linear accuracy, and we provide pilot evidence of improved token efficiency in continued pretraining.
翻译:少量经过验证的目标信息能否引导基础模型昂贵的自监督预训练过程?标准预训练方法优化固定的代理目标(如下一个词元预测),这可能导致计算资源分配偏离下游关注的核心能力。我们提出V-Pretraining:一种基于价值、模态无关的受控持续预训练方法,通过轻量级任务设计器重塑预训练任务以最大化每个梯度步的价值。以数据增强的自监督学习为例,V-Pretraining任务设计器选择的预训练任务(如增强策略)需满足:其预训练损失梯度与下游任务(如图像分割)计算的梯度方向一致。这种方法能将预训练导向相关的下游能力。值得注意的是,预训练模型从未使用下游任务标签进行更新,这些标签仅用于塑造预训练任务。在相同学习器更新预算下,对0.5B-7B语言模型进行V-Pretraining,仅使用12%的GSM8K训练样本作为反馈,相比标准下一个词元预测方法在推理能力(GSM8K测试Pass@1)上实现最高18%的相对提升。在视觉自监督学习中,我们将ADE20K数据集的最先进结果提升1.07 mIoU,在降低NYUv2 RMSE的同时提升ImageNet线性分类准确率,并为持续预训练中的词元效率改进提供了初步证据。