Recently, there has been a significant focus on exploring the theoretical aspects of deep learning, especially regarding its performance in classification tasks. Bayesian deep learning has emerged as a unified probabilistic framework, seeking to integrate deep learning with Bayesian methodologies seamlessly. However, there exists a gap in the theoretical understanding of Bayesian approaches in deep learning for classification. This study presents an attempt to bridge that gap. By leveraging PAC-Bayes bounds techniques, we present theoretical results on the prediction or misclassification error of a probabilistic approach utilizing Spike-and-Slab priors for sparse deep learning in classification. We establish non-asymptotic results for the prediction error. Additionally, we demonstrate that, by considering different architectures, our results can achieve minimax optimal rates in both low and high-dimensional settings, up to a logarithmic factor. Moreover, our additional logarithmic term yields slight improvements over previous works. Additionally, we propose and analyze an automated model selection approach aimed at optimally choosing a network architecture with guaranteed optimality.
翻译:近年来,深度学习理论层面的研究备受关注,特别是其在分类任务中的性能表现。贝叶斯深度学习作为一种统一的概率框架,旨在将深度学习与贝叶斯方法无缝融合。然而,现有研究对贝叶斯方法在深度学习分类任务中的理论理解仍存在空白。本文试图填补这一空白。通过利用PAC-Bayes界技术,我们给出了采用Spike-and-Slab先验的稀疏深度学习概率方法在预测或误分类误差方面的理论结果。我们建立了预测误差的非渐近结论。此外,我们证明,通过考虑不同的网络架构,在低维和高维设置下(至多相差一个对数因子),我们的结果均可达到极小化最优速率。同时,与先前工作相比,我们引入的额外对数项带来了轻微改进。进一步地,我们提出并分析了一种自动模型选择方法,旨在以最优性保证为目标,实现网络架构的最优选择。