Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.
翻译:深度学习(DL)作为一种源自20世纪80年代提出的神经网络算法变体,在人工智能(AI)领域已取得惊人进展,涵盖语言翻译、蛋白质折叠、自动驾驶,以及近期类人语言模型(聊天机器人)等此前难以攻克的任务。尽管深度学习网络的应用日益广泛,但其学习机制和表征方式——正是这些特性使网络在多元应用场景中保持高效——仍鲜为人知。部分答案或许在于网络架构的庞大规模与海量数据,毕竟自1987年以来算法核心并未发生根本性变革。然而,深度表征的本质仍属未知领域。遗憾的是,包含数百万乃至数十亿词元的训练集具有未知的组合特性,而拥有数百万或数十亿隐藏单元的网络难以直观呈现,其内部机制亦难以揭示。本文通过构建大规模(VGG;124万权重)深度学习网络,在新颖的高密度样本任务(5个唯一词元,每个词元至少500个样本)中探究上述问题,从而更精细地追踪类别结构与特征构建的涌现过程。我们运用多种可视化方法观测分类机制的涌现过程以及特征检测器与结构耦合的发展轨迹,这构成了某种图形化引导机制。基于这些发现,我们提炼出深度学习动态机制的基础性观察结果,并依据实验数据提出一种复杂特征构建的新理论。