Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines.
翻译:自动标注是一类重要的技术,旨在以最少的人工标注生成带标签的训练集。其中一种主流变体——基于阈值的自动标注(TBAL),通过寻找模型置信分数的阈值,使得模型能够准确标注高于该阈值的未标注数据点。然而,许多模型已知会产生过置信分数,导致TBAL性能不佳。尽管一个自然的思路是应用现成的校准方法来缓解过置信问题,但这类方法仍存在不足。我们不采用针对特定场景的置信函数设计,而是提出一个研究TBAL最优置信函数的框架。我们开发了该框架的可实现版本,并得到\texttt{Colander}(面向高效可靠自动标注的置信函数),这是一种专为提升TBAL系统性能而设计的后处理方法。我们对\texttt{Colander}方法进行了广泛的实证评估,并与专为校准设计的方法进行对比。实验表明:在保持自动标注误差低于5%且使用与基线相同标注数据量的条件下,\texttt{Colander}相较于基线方法,覆盖率最高可提升60%。