A Framework and Benchmark for Deep Batch Active Learning for Regression

from arxiv, Changes in v3: Improvements in writing and other minor changes. Accompanying code can be found at https://github.com/dholzmueller/bmdal_reg

The acquisition of labels for supervised learning can be expensive. In order to improve the sample-efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian Process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width Neural Tangent Kernels, and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.

翻译：有监督学习中的标签获取可能成本高昂。为提升神经网络回归的样本效率，我们研究了自适应选择未标记数据批次进行标注的主动学习方法。我们提出了一种框架，该框架可通过（网络相关的）基础核函数、核变换及选择方法来构建此类方法。该框架囊括了众多基于高斯过程逼近神经网络的贝叶斯方法，以及非贝叶斯方法。此外，我们提出用基于草图构建的有限宽度神经正切核替代常用的最后一层特征，并将其与一种新型聚类方法相结合。为评估不同方法，我们引入了一个包含15个大型表格回归数据集的开放基准。我们提出的方法在该基准上超越了现有最优方法，可扩展至大型数据集，且无需调整网络架构或训练代码即可直接使用。我们提供了包含所有核函数、核变换及选择方法高效实现的开源代码，可用于复现实验结果。

相关内容

主动学习

关注 243

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【CMU-Amazon】时间序列预测：理论与实践，379页ppt阐述大规模时序预测工具与方法

专知会员服务

234+阅读 · 2020年4月24日