A Highly Efficient Diversity-based Input Selection for DNN Improvement Using VLMs

Maintaining or improving the performance of Deep Neural Networks (DNNs) through fine-tuning requires labeling newly collected inputs, a process that is often costly and time-consuming. To alleviate this problem, input selection approaches have been developed in recent years to identify small, yet highly informative subsets for labeling. Diversity-based selection is one of the most effective approaches for this purpose. However, they are often computationally intensive and lack scalability for large input sets, limiting their practical applicability. To address this challenge, we introduce Concept-Based Diversity (CBD), a highly efficient metric for image inputs that leverages Vision-Language Models (VLM). Our results show that CBD exhibits a strong correlation with Geometric Diversity (GD), an established diversity metric, while requiring only a fraction of its computation time. Building on this finding, we propose a hybrid input selection approach that combines CBD with Margin, a simple uncertainty metric. We conduct a comprehensive evaluation across a diverse set of DNN models, input sets, selection budgets, and five most effective state-of-the-art selection baselines. The results demonstrate that the CBD-based selection consistently outperforms all baselines at guiding input selection to improve the DNN model. Furthermore, the CBD-based selection approach remains highly efficient, requiring selection times close to those of simple uncertainty-based methods such as Margin, even on larger input sets like ImageNet. These results confirm not only the effectiveness and computational advantage of the CBD-based approach, particularly compared to hybrid baselines, but also its scalability in repetitive and extensive input selection scenarios.

翻译：通过微调来维持或提升深度神经网络（DNN）的性能，通常需要对新收集的输入数据进行标注，这一过程往往成本高昂且耗时。为缓解此问题，近年来发展了多种输入选择方法，旨在识别出规模小但信息量高的子集进行标注。基于多样性的选择是其中最有效的途径之一。然而，这类方法通常计算密集，且对于大规模输入集缺乏可扩展性，限制了其实际应用。为应对这一挑战，我们引入了概念多样性（CBD），一种利用视觉语言模型（VLM）的高效图像输入多样性度量指标。我们的结果表明，CBD与已确立的多样性度量指标——几何多样性（GD）表现出强相关性，同时仅需其计算时间的一小部分。基于这一发现，我们提出了一种混合输入选择方法，将CBD与一种简单的不确定性度量指标Margin相结合。我们在多种DNN模型、输入集、选择预算以及五种最先进的现有选择基线方法上进行了全面评估。结果表明，基于CBD的选择方法在指导输入选择以改进DNN模型方面，始终优于所有基线方法。此外，基于CBD的选择方法保持了极高的效率，即使在ImageNet等较大输入集上，其选择时间也接近于Margin等简单基于不确定性的方法。这些结果不仅证实了基于CBD的方法（尤其是与混合基线方法相比）的有效性和计算优势，也证明了其在重复且大规模的输入选择场景中的可扩展性。