Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty, but it is often intractable to compute, especially for complex decision boundaries formed in multiclass classification tasks. To address this issue, this paper proposes the {\it least disagree metric} (LDM), defined as the smallest probability of disagreement of the predicted label, and an estimator for LDM proven to be asymptotically consistent under mild assumptions. The estimator is computationally efficient and can be easily implemented for deep learning models using parameter perturbation. The LDM-based active learning is performed by querying unlabeled data with the smallest LDM. Experimental results show that our LDM-based active learning algorithm obtains state-of-the-art overall performance on all considered datasets and deep architectures.
翻译:主动学习是一种机器学习范式,旨在通过策略性地选择和查询未标注数据来提升模型性能。一种有效的选择策略是基于模型的预测不确定性,这可以解读为样本信息量的一种度量。样本到决策边界的距离是预测不确定性的自然度量,但该距离通常难以计算,尤其在多分类任务中形成的复杂决策边界下更为困难。为解决这一问题,本文提出了**最小分歧度量**(LDM),定义为预测标签的最小分歧概率,并给出了一种在温和假设下渐进一致的LDM估计器。该估计器计算高效,可通过参数扰动轻松应用于深度学习模型。基于LDM的主动学习方法通过查询最小LDM值的未标注数据来实现。实验结果表明,本文基于LDM的主动学习算法在所有考虑的数据集和深度架构上均取得了最优的整体性能。