Deep active learning (AL) selects batches of instances for annotation to avoid retraining deep neural networks (DNNs) after each new label. Employing a naive top-$b$ selection can result in a batch of redundant (similar) instances. To address this, various AL strategies employ clustering techniques that ensure diversity within a batch. We approach this issue by substituting the costly retraining with an efficient Bayesian update. Our proposed update represents a second-order optimization step using the Gaussian posterior from a last-layer Laplace approximation. Thereby, we achieve low computational complexity by computing the inverse Hessian in closed form. We demonstrate that in typical AL settings, our update closely approximates retraining while being considerably faster. Leveraging our update, we introduce a new framework for batch selection through sequential construction, updating the DNN after each label acquisition. Furthermore, we incorporate our update into a look-ahead selection strategy as a feasible upper baseline approximating optimal batch selection. Our results highlight the potential of efficient updates to advance deep AL research.
翻译:深度主动学习(AL)通过选择批量样本进行标注,以避免在每次获得新标签后重新训练深度神经网络(DNN)。采用简单的 top-$b$ 选择策略可能导致批量样本冗余(相似)。为解决此问题,多种 AL 策略采用聚类技术以确保批次内的多样性。我们通过以高效贝叶斯更新替代昂贵的重新训练来处理该问题。所提出的更新利用最后一层拉普拉斯近似的高斯后验,实现了二阶优化步骤。由此,我们通过闭式求解逆 Hessian 矩阵获得了较低的计算复杂度。我们证明在典型 AL 设置中,该更新能紧密逼近重新训练效果,同时显著提升计算速度。基于此更新机制,我们提出一种通过序列构建进行批量选择的新框架,在每次获取标签后即时更新 DNN。此外,我们将该更新整合到前瞻选择策略中,作为逼近最优批量选择的可行上界基准。研究结果凸显了高效更新机制对推动深度 AL 研究的潜在价值。