Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

Modern machine learning algorithms, especially deep learning based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search based approaches to automating this laborious and compute intensive task, the fundamental learning theoretic complexity of tuning hyperparameters for deep neural networks is poorly understood. Inspired by this glaring gap, we initiate the formal study of hyperparameter tuning complexity in deep learning through a recently introduced data driven setting. We assume that we have a series of deep learning tasks, and we have to tune hyperparameters to do well on average over the distribution of tasks. A major difficulty is that the utility function as a function of the hyperparameter is very volatile and furthermore, it is given implicitly by an optimization problem over the model parameters. To tackle this challenge, we introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance as we vary the hyperparameter; our analysis relies on subtle concepts including tools from differential/algebraic geometry and constrained optimization. This can be used to show that the learning theoretic complexity of the corresponding family of utility functions is bounded. We instantiate our results and provide sample complexity bounds for concrete applications tuning a hyperparameter that interpolates neural activation functions and setting the kernel parameter in graph neural networks.

翻译：现代机器学习算法，尤其是基于深度学习的技术，通常需要仔细的超参数调优才能达到最佳性能。尽管贝叶斯优化和基于随机搜索的自动化方法等实用技术引起了广泛关注，旨在减轻这项耗时且计算密集的任务，但对于深度神经网络超参数调优的基本学习理论复杂性，我们仍知之甚少。受这一显著差距的启发，我们通过最近引入的数据驱动设定，正式启动了深度学习中超参数调优复杂性的研究。我们假设存在一系列深度学习任务，并且必须通过调优超参数来在任务分布上取得良好的平均性能。一个主要困难在于，效用函数作为超参数的函数具有高度波动性，并且它是由模型参数上的优化问题隐式给出的。为应对这一挑战，我们引入了一种新技术，用于刻画在固定问题实例上，随着超参数变化时效用函数的不连续性和振荡特性；我们的分析依赖于微分/代数几何和约束优化等精细概念。这可用于证明相应效用函数族的学习理论复杂性是有界的。我们具体化了研究结果，并为实际应用提供了样本复杂度界，包括插值神经激活函数的超参数调优以及图神经网络中核参数的设置。