One of the challenges in deploying a machine learning model is that the model's performance degrades as the operating environment changes. To maintain the performance, streaming active learning is used, in which the model is retrained by adding a newly annotated sample to the training dataset if the prediction of the sample is not certain enough. Although many streaming active learning methods have been proposed for classification, few efforts have been made for regression problems, which are often handled in the industrial field. In this paper, we propose to use the regression-via-classification framework for streaming active learning for regression. Regression-via-classification transforms regression problems into classification problems so that streaming active learning methods proposed for classification problems can be applied directly to regression problems. Experimental validation on four real data sets shows that the proposed method can perform regression with higher accuracy at the same annotation cost.
翻译:在部署机器学习模型时,一个挑战是模型性能会随着运行环境的变化而下降。为保持性能,可采用流式主动学习:当对样本的预测不够确定时,将该样本的新标注数据加入训练集以重新训练模型。尽管已有许多针对分类问题的流式主动学习方法被提出,但在工业领域常见的回归问题上相关研究却很少。本文提出将回归转分类框架用于回归问题的流式主动学习。回归转分类将回归问题转化为分类问题,从而使得针对分类问题提出的流式主动学习方法可直接应用于回归问题。在四个真实数据集上的实验验证表明,在相同标注成本下,所提方法能实现更高精度的回归。