Calibrated Dataset Condensation for Faster Hyperparameter Search

Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs.

翻译：数据集压缩可通过将训练数据集压缩为小型合成集，从而降低在大型数据集上训练多个模型的计算成本。现有先进方法依赖于匹配真实数据与合成数据之间的模型梯度。然而，压缩数据的泛化性缺乏理论保证：实践中，数据压缩在不同超参数/架构间的泛化效果通常较差。本文提出一种面向超参数搜索的差异化压缩目标：旨在生成合成验证数据集，使得不同超参数模型在压缩数据集与原始数据集上的验证性能排序具有可比性。我们提出一种新颖的超参数校准数据集压缩算法，该算法通过匹配基于隐式微分与高效逆海森近似计算的超参数梯度来获得合成验证数据集。实验表明，所提框架能有效保持模型的验证性能排序，并在图像与图任务中显著加速超参数/架构搜索。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日