Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.
翻译:深度学习(DL)作为20世纪80年代提出的神经网络算法变体,已在人工智能(AI)领域取得惊人进展——从语言翻译、蛋白质折叠、自动驾驶到近期类人语言模型(聊天机器人),这些在不久前还看似棘手的问题如今均被攻克。尽管深度学习网络的应用日益广泛,但其学习机制和表征方式——这些让网络在如此多样的应用领域保持高效的关键要素——实际上仍鲜为人知。部分答案必然源于架构的庞大规模和数据的海量规模,因为自1987年以来核心机制并未发生根本性变化。然而,深度学习的表征本质在很大程度上仍是未解之谜。遗憾的是,包含数百万乃至数十亿词元的训练集具有未知的组合特性,拥有数百万至数十亿隐藏单元的网络难以可视化,其运行机制也无法轻易揭示。本文通过构建大规模(124万参数量;VGG)深度学习网络在高密度样本任务(5个独特词元,每个词元至少包含500个示例)中的创新应用,使我们能够更精确地追踪类别结构和特征构建的涌现过程。我们采用多种可视化方法追踪分类的生成过程,以及特征检测器与结构耦合的发展演变——这种耦合机制形成了一种图形化引导机制。基于这些研究结果,我们提炼出对深度学习动态机制的基础性观察,并据此提出复杂特征构建的新理论。