We propose a validation-free checkpointing signal from a single forward-backward pass: the Frobenius norm of the classifier-head gradient on one detached-feature batch, ||g||_F = ||dL/dW||_F. Across ImageNet-1k CNNs and Transformers, this proxy is strongly negative with Top-1 and positive with loss. Selecting the checkpoint with the minimum head gradient in a short tail window closes most of the gap to the oracle (4.24% +/- 2.00% with a universal setup, about 1.12% with light per-family tuning). For practical deployment, a head-scale normalization is more stable within classic CNN families (e.g., ResNets), while a feature-scale normalization works well for Transformers and modern CNNs. The same one-batch probe also predicts COCO detection/segmentation mAP. In diffusion (UNet/DDPM on CIFAR-10), it tracks progress and enables near-oracle tail-window selection; it is positively correlated with same-distribution probe MSE and negatively with FID (lower is better), so it can be used as a lightweight, label-free monitor. Validation labels are never used beyond reporting. The probe adds much less than 0.1% of an epoch and works as a drop-in for validation-free checkpoint selection and early stopping.
翻译:我们提出一种无需验证的检查点信号,仅需单次前向-反向传播:分类器头梯度在单批次解耦特征上的弗罗贝尼乌斯范数,即 ||g||_F = ||dL/dW||_F。在ImageNet-1k的CNN与Transformer模型中,该代理指标与Top-1准确率呈强负相关,与损失值呈正相关。在短尾窗口内选择具有最小头部梯度的检查点,可基本消除与理想选择间的差距(通用设置下差距为4.24% +/- 2.00%,经轻量级模型族调优后约为1.12%)。对于实际部署,经典CNN家族(如ResNets)采用头部尺度归一化更为稳定,而Transformer与现代CNN则适合特征尺度归一化。同一单批次探测方法亦能预测COCO检测/分割的mAP指标。在扩散模型中(CIFAR-10上的UNet/DDPM),该指标可追踪训练进程并实现接近理想的尾窗选择;其与同分布探测MSE呈正相关,与FID(越低越好)呈负相关,因此可作为轻量级无标签监控工具。除结果报告外全程无需使用验证集标签。该探测方法增加的算力消耗远低于0.1个训练周期,可作为即插即用方案用于无验证的检查点选择与早停策略。