Continued pretraining is optimized with fixed self-supervised tasks but selected by downstream performance, creating a coarse feedback loop in which practitioners evaluate checkpoints, change data mixtures or objectives, and restart runs, while individual updates remain blind to target capabilities. We ask whether a small set of verifiable downstream examples can provide step-level feedback without directly supervising the learner. We introduce V-pretraining, which decouples a learner trained only with a self-supervised loss from a lightweight task designer that constructs targets or views for unlabeled batches. Given the current learner and batch, V-pretraining scores a candidate construction by predicting the first-order reduction in downstream loss after the induced self-supervised update. The designer maximizes this value; the learner then applies the update with targets or views detached, so downstream labels never update learner parameters. We instantiate V-pretraining as adaptive top-K soft targets for language modeling and learned views or masks for self-supervised vision. Across both modalities, V-pretraining improves target capabilities without degrading generalization. Under wall-clock-matched continued pretraining, it improves GSM8K Pass@1 for Qwen models using 1,024 GSM8K examples only as feedback, including a +7.4 point single-run gain for Qwen2.5-0.5B. In vision, it improves DINOv3 transfer to ADE20K semantic segmentation and NYUv2 depth estimation while preserving ImageNet linear accuracy, suggesting that feedback-guided task construction can improve target capabilities without collapsing general-purpose representations.
翻译:持续预训练通过固定的自监督任务进行优化,但依赖下游性能选择模型,形成粗粒度的反馈循环:实践者评估检查点、调整数据混合或训练目标并重新运行,而单次更新仍对目标任务能力不可见。本文探究是否可利用少量可验证的下游样本提供步级反馈,同时避免直接监督学习器。我们提出V-pretraining方法,将仅通过自监督损失训练的学习器与轻量级任务设计器解耦,后者为无标签批次构造训练目标或视图。给定当前学习器与批次,V-pretraining通过预测自监督更新后下游损失的一阶减少量来评分候选构造,设计器最大化该值,学习器随后应用解耦目标或视图的更新,使下游标签永不更新学习器参数。我们将V-pretraining实例化为语言建模中的自适应Top-K软目标,以及自监督视觉中的学习视图或掩码。跨两种模态,V-pretraining提升目标能力而不损害泛化性能。在等壁钟持续预训练下,该方法仅使用1,024个GSM8K示例作为反馈,便将Qwen模型的GSM8K Pass@1提升(例如Qwen2.5-0.5B单次运行提升+7.4个百分点)。在视觉任务中,V-pretraining改进DINOv3向ADE20K语义分割和NYUv2深度估计的迁移性能,同时保持ImageNet线性准确率,表明反馈引导的任务构造可在不破坏通用表征的情况下提升目标能力。