In many CLIP adaptation methods, a blending ratio hyperparameter controls the trade-off between general pretrained CLIP knowledge and the limited, dataset-specific supervision from the few-shot cases. Most few-shot CLIP adaptation techniques report results by ablation of the blending ratio on the test set or require additional validation sets to select the blending ratio per dataset, and thus are not strictly few-shot. We present a simple, validation-free method for learning the blending ratio in CLIP adaptation. Hold-One-Shot-Out (HOSO) presents a novel approach for CLIP-Adapter-style methods to compete in the newly established validation-free setting. CLIP-Adapter with HOSO (HOSO-Adapter) learns the blending ratio using a one-shot, hold-out set, while the adapter trains on the remaining few-shot support examples. Under the validation-free few-shot protocol, HOSO-Adapter outperforms the CLIP-Adapter baseline by more than 4 percentage points on average across 11 standard few-shot datasets. Interestingly, in the 8- and 16-shot settings, HOSO-Adapter outperforms CLIP-Adapter even with the optimal blending ratio selected on the test set. Ablation studies validate the use of a one-shot hold-out mechanism, decoupled training, and improvements over the naively learnt blending ratio baseline. Code is released here: https://github.com/chris-vorster/HOSO-Adapter
翻译:在许多CLIP适配方法中,混合比例超参数控制着通用预训练CLIP知识与有限、数据集特定的少样本监督之间的权衡。大多数少样本CLIP适配技术通过消融测试集上的混合比例来报告结果,或需要额外的验证集来为每个数据集选择混合比例,因此并非严格意义上的少样本方法。我们提出了一种简单、无需验证的方法来学习CLIP适配中的混合比例。留一法(HOSO)为CLIP-Adapter类方法提供了一种新颖策略,使其能在新确立的无需验证设置中竞争。采用HOSO的CLIP-Adapter(HOSO-Adapter)利用一个单样本留出集学习混合比例,而适配器则在剩余的少样本支持示例上进行训练。在无需验证的少样本协议下,HOSO-Adapter在11个标准少样本数据集上的平均性能比CLIP-Adapter基线高出4个百分点以上。有趣的是,在8样本和16样本设置中,即使使用在测试集上选择的最优混合比例,HOSO-Adapter仍优于CLIP-Adapter。消融研究验证了单样本留出机制、解耦训练的有效性,并证明了其相对于朴素学习混合比例基线的改进。代码发布于:https://github.com/chris-vorster/HOSO-Adapter