Active learning is a machine learning paradigm designed to optimize model performance in a setting where labeled data is expensive to acquire. In this work, we propose a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes. By targeting these points, SUPClust aims to gather information that is most informative for refining the model's prediction of complex decision regions. We demonstrate experimentally that labeling these points leads to strong model performance. This improvement is observed even in scenarios characterized by strong class imbalance.
翻译:主动学习是一种机器学习范式,旨在标注数据获取成本高昂的场景下优化模型性能。本研究提出了一种名为SUPClust的新型主动学习方法,其核心思想是识别类别间决策边界上的数据点。通过聚焦这些边界点,SUPClust旨在收集对完善模型对复杂决策区域预测最具信息量的数据。实验证明,标注这些边界点能显著提升模型性能,即使在类别严重不平衡的场景下仍可观察到这种改进效果。