While deep learning has enabled significant advances in many areas of science, its black-box nature hinders architecture design for future artificial intelligence applications and interpretation for high-stakes decision makings. We addressed this issue by studying the fundamental question of how deep neural networks process data in the intermediate layers. Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership throughout all layers for classification. This law shows that each layer improves data separation at a constant geometric rate, and its emergence is observed in a collection of network architectures and datasets during training. This law offers practical guidelines for designing architectures, improving model robustness and out-of-sample performance, as well as interpreting the predictions.
翻译:虽然深度学习已在众多科学领域取得了显著进展,但其黑箱特性阻碍了面向未来人工智能应用的架构设计,以及高风险决策场景下的可解释性。我们通过研究深度神经网络在中间层处理数据的基本问题来应对这一挑战。本研究发现一个简单且定量的定律,该定律揭示了深度神经网络如何根据类别归属在分类任务的所有层级中分离数据。该定律表明,每个层级以恒定的几何速率提升数据分离度,且这一现象在训练过程中的多种网络架构与数据集上均被观测到。该定律为架构设计、模型鲁棒性与样本外性能提升,以及预测结果的可解释性提供了实用指导原则。