Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed solution is a simple framework for semi-supervised Bayesian active learning. We find it produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data. It is also easier to scale up than the conventional approach. As well as supporting a shift towards semi-supervised models, our findings highlight the importance of studying models and acquisition methods in conjunction.
翻译:全监督模型在贝叶斯主动学习中占据主导地位。我们认为,这些模型忽视无标签数据中的信息不仅损害了预测性能,还影响了关于采集何种数据的决策。我们提出的解决方案是一种简单的半监督贝叶斯主动学习框架。研究发现,该框架相比传统贝叶斯主动学习或使用随机采集数据的半监督学习,能够生成性能更优的模型。此外,该框架比传统方法更易于扩展。我们的研究成果不仅支持向半监督模型的转变,还强调了联合研究模型与采集方法的重要性。