This paper addresses a new active learning strategy for regression problems. The presented Wasserstein active regression model is based on the principles of distribution-matching to measure the representativeness of the labeled dataset. The Wasserstein distance is computed using GroupSort Neural Networks. The use of such networks provides theoretical foundations giving a way to quantify errors with explicit bounds for their size and depth. This solution is combined with another uncertainty-based approach that is more outlier-tolerant to complete the query strategy. Finally, this method is compared with other classical and recent solutions. The study empirically shows the pertinence of such a representativity-uncertainty approach, which provides good estimation all along the query procedure. Moreover, the Wasserstein active regression often achieves more precise estimations and tends to improve accuracy faster than other models.
翻译:本文针对回归问题提出了一种新的主动学习策略。所提出的Wasserstein主动回归模型基于分布匹配原理,用于衡量已标注数据集的代表性。通过GroupSort神经网络计算Wasserstein距离,此类网络提供了理论依据,能够以明确的误差界量化其规模与深度带来的误差。该解决方案结合另一种更具异常值容忍性的不确定性方法,以完善查询策略。最后,将该方法与经典及最新方案进行对比。实验研究表明,这种代表性与不确定性相融合的方法具有显著有效性,能在整个查询过程中提供良好的估计。此外,Wasserstein主动回归通常能实现更精确的估计,且其准确率提升速度往往优于其他模型。