On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the functionality and efficiency of DNNs. Our findings reveal that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights. In addition, our research shows that the learnable parameters of widely varied deep learning models such as DNN, CNN, and ViT exhibit similar learning characteristics.

翻译：本文深入剖析深度学习模型的结构与运行机制，特别聚焦于可学习参数（权重）的统计特性、分布规律、节点交互及可视化表征等微观特征。通过建立权重模式变异度与网络整体性能之间的关联性，我们系统探究了各类深度学习模型表现差异（最优与次优）的内在机理。我们的实证分析覆盖MNIST、Fashion-MNIST和CIFAR-10等权威数据集，并涵盖深度神经网络（DNNs）、卷积神经网络（CNNs）及视觉Transformer（ViT）等多种模型架构，从而精准识别出与成功网络性能相关的可学习参数特征。通过对多样化深度学习模型架构的广泛实验，我们揭示了影响深度神经网络功能与效率的关键因素。研究发现：无论数据集或模型类型如何，成功网络在收敛后的权重统计量与分布特征上始终表现出高度相似性，而性能欠佳的网络则呈现显著的权重异质性。此外，本研究还表明DNN、CNN、ViT等差异显著的深度学习模型，其可学习参数均展现出相似的学习特性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日