This study addresses the integration of diversity-based and uncertainty-based sampling strategies in active learning, particularly within the context of self-supervised pre-trained models. We introduce a straightforward heuristic called TCM that mitigates the cold start problem while maintaining strong performance across various data levels. By initially applying TypiClust for diversity sampling and subsequently transitioning to uncertainty sampling with Margin, our approach effectively combines the strengths of both strategies. Our experiments demonstrate that TCM consistently outperforms existing methods across various datasets in both low and high data regimes.
翻译:本研究探讨了在主动学习中,特别是在自监督预训练模型的背景下,如何整合基于多样性和基于不确定性的采样策略。我们提出了一种名为TCM的简单启发式方法,该方法能够缓解冷启动问题,同时在不同数据规模下保持优异性能。通过首先应用TypiClust进行多样性采样,随后过渡到使用Margin进行不确定性采样,我们的方法有效结合了两种策略的优势。实验结果表明,TCM在低数据和高数据场景下均持续优于现有方法。