In deep active learning, it is especially important to choose multiple examples to markup at each step to work efficiently, especially on large datasets. At the same time, existing solutions to this problem in the Bayesian setup, such as BatchBALD, have significant limitations in selecting a large number of examples, associated with the exponential complexity of computing mutual information for joint random variables. We, therefore, present the Large BatchBALD algorithm, which gives a well-grounded approximation to the BatchBALD method that aims to achieve comparable quality while being more computationally efficient. We provide a complexity analysis of the algorithm, showing a reduction in computation time, especially for large batches. Furthermore, we present an extensive set of experimental results on image and text data, both on toy datasets and larger ones such as CIFAR-100.
翻译:在深度主动学习中,为高效处理特别是大规模数据集时,在每个步骤中选择多个示例进行标注尤为重要。然而,贝叶斯框架下现有解决该问题的方法(例如BatchBALD)在选择大量示例方面存在显著限制,这源于联合随机变量互信息计算的指数级复杂度。因此,我们提出Large BatchBALD算法,该算法为BatchBALD方法提供了一种具有良好理论基础的近似方案,旨在保持可比性的同时提升计算效率。我们对该算法进行了复杂度分析,证明其能有效降低计算时间,尤其是处理大批量数据时。此外,我们在图像和文本数据上进行了广泛实验,涵盖玩具数据集及CIFAR-100等大规模数据集。