Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by Disagreement (BALD) acquisitions function. However, we highlight that it is challenging to estimate BALD when the new data points are subject to censorship, where only clipped values of the targets are observed. To address this, we derive the entropy and the mutual information for censored distributions and derive the BALD objective for active learning in censored regression ($\mathcal{C}$-BALD). We propose a novel modelling approach to estimate the $\mathcal{C}$-BALD objective and use it for active learning in the censored setting. Across a wide range of datasets and models, we demonstrate that $\mathcal{C}$-BALD outperforms other Bayesian active learning methods in censored regression.
翻译:贝叶斯主动学习基于信息论方法,旨在最大化新观测数据为模型参数提供的信息量。这一目标通常通过最大化贝叶斯主动学习分歧(BALD)采集函数来实现。然而,我们指出,当新数据点存在删失(即仅观测到目标的截断值)时,BALD的估计面临挑战。为解决此问题,我们推导了删失分布的熵与互信息,并提出了删失回归中主动学习的BALD目标函数($\mathcal{C}$-BALD)。我们提出了一种新型建模方法来估计$\mathcal{C}$-BALD目标,并将其应用于删失场景下的主动学习。在多种数据集和模型上的实验表明,$\mathcal{C}$-BALD在删失回归中的表现优于其他贝叶斯主动学习方法。