The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics. Our main theoretical result states that a certain locally connected neural network is capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement under certain canonical partitions of features. As a practical application of this result, we derive a preprocessing method for enhancing the suitability of a data distribution to locally connected neural networks. Experiments with widespread models over various datasets demonstrate our findings. We hope that our use of quantum entanglement will encourage further adoption of tools from physics for formally reasoning about the relation between deep learning and real-world data.
翻译:数据分布适合深度学习的基本原因是一个基础性的开放问题。针对局部连接神经网络(这一包含卷积神经网络、循环神经网络以及局部自注意力模型在内的主流架构家族),我们通过引入量子物理的理论工具来解决这一问题。我们的主要理论结果表明,某个特定的局部连接神经网络能够对数据分布进行准确预测,当且仅当该数据分布在特征的特定规范划分下具有低量子纠缠度。作为这一结论的实际应用,我们推导出一种预处理方法,用于增强数据分布对局部连接神经网络的适用性。在多种数据集上对主流模型的实验验证了我们的发现。我们期望,对量子纠缠的运用能够鼓励学界进一步采用物理学的工具,从形式上推理深度学习与真实世界数据之间的关系。