The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields, yet it faces the significant challenge of high sample annotation costs. To mitigate this, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection. Furthermore, the recent active finetuning approach has primarily concentrated on aligning the distribution of selected subsets with the overall data pool, focusing solely on diversity. In this paper, we propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty. The process begins with the identification of pseudo-class centers, followed by an innovative denoising method and an iterative strategy for boundary sample selection in the high-dimensional feature space, all without relying on ground-truth labels. Our comprehensive experiments provide both qualitative and quantitative evidence of our method's efficacy, outperforming all the existing baselines.
翻译:预训练-微调范式在视觉任务及其他领域得到了广泛应用,但其面临着样本标注成本高昂的重大挑战。为缓解这一问题,主动微调概念应运而生,旨在有限预算内选择最合适的样本进行模型微调。传统主动学习方法由于批次选择中固有的偏差,在此场景中往往效果不佳。此外,近期主动微调方法主要集中于使所选子集的分布与整体数据池对齐,仅关注多样性。本文提出一种双层主动微调框架,用于一次性选择待标注样本,该框架包含两个阶段:面向多样性的核心样本选择,以及面向不确定性的边界样本选择。该过程首先识别伪类别中心,随后在无需真实标签的情况下,采用创新的去噪方法和迭代策略在高维特征空间中进行边界样本选择。我们的综合实验从定性和定量两方面证明了该方法的有效性,其性能超越了所有现有基准方法。