Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is little understanding of the theoretical limitations/guarantees of dataset distillation, specifically, what excess risk is achieved by distillation compared to the original dataset, and how large are distilled datasets? In this work, we take a theoretical view on kernel ridge regression (KRR) based methods of dataset distillation such as Kernel Inducing Points. By transforming ridge regression in random Fourier features (RFF) space, we provide the first proof of the existence of small (size) distilled datasets and their corresponding excess risk for shift-invariant kernels. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. We further show that a KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data. The size of this set is linear in the dimension of the RFF space of the input set or alternatively near linear in the number of effective degrees of freedom, which is a function of the kernel, number of datapoints, and the regularization parameter $\lambda$. The error bound of this distilled set is also a function of $\lambda$. We verify our bounds analytically and empirically.
翻译:数据集蒸馏是一项从大型数据集中合成小型数据集的任务,同时仍保持与原始未压缩数据集相当的可预测精度。尽管近年来取得了显著的实证进展,但对数据集蒸馏的理论局限性/保证理解甚少,具体而言,与原始数据集相比,蒸馏导致多少超额风险?蒸馏数据集能有多小?在本文中,我们从理论上审视基于核脊回归(KRR)的数据集蒸馏方法,如核诱导点。通过将随机傅里叶特征(RFF)空间中的脊回归进行变换,我们首次证明了对于平移不变核,存在小(尺寸)的蒸馏数据集及其相应的超额风险。我们证明,在原始输入空间中存在一组小规模实例,使得其在RFF空间中的解与原始数据的解一致。我们进一步表明,可以利用这组蒸馏实例生成KRR解,该解近似于在全输入数据上优化的KRR解。该集合的大小与输入集的RFF空间维度呈线性关系,或者与有效自由度数目近似呈线性关系,而有效自由度是核函数、数据点数量以及正则化参数λ的函数。该蒸馏集的误差界限也是λ的函数。我们通过分析和实证验证了我们的界限。