Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC approximation methods with feasible computational costs and assessed the accuracy trade-off. Our experimental results indicate that the estimated TIC values correlate well with the generalization gap under conditions close to the NTK regime. However, we show both theoretically and empirically that outside the NTK regime such correlation disappears. Finally, we demonstrate that TIC provides better trial pruning ability than existing methods for hyperparameter optimization.
翻译:泛化度量指标在机器学习领域被广泛研究,以更好地刻画泛化差距。然而,由于深度神经网络(DNNs)的复杂特性,为这类统计奇异模型建立可靠的泛化度量指标具有挑战性。本研究聚焦于竹内信息准则(TIC),探讨这一经典度量指标在何种条件下能有效解释DNNs的泛化差距。值得注意的是,所发展的理论表明TIC在接近神经正切核(NTK)机制的区域具有适用性。通过一系列实验,我们在四个数据集上训练了超过5,000个包含12种架构的DNN模型(包括VGG-16等大型模型),并估计了相应的TIC值以检验泛化差距与TIC估计值之间的关系。我们采用多种计算成本可行的TIC近似方法,并评估了其精度权衡。实验结果表明,在接近NTK机制的条件下,估计的TIC值与泛化差距呈现良好相关性。然而,我们从理论和实验两方面证明,在NTK机制之外这种相关性会消失。最后,我们论证了TIC在超参数优化方面比现有方法具有更优的试验剪枝能力。