On the Complexity of Neural Computation in Superposition

Recent advances in the understanding of neural networks suggest that superposition, the ability of a single neuron to represent multiple features simultaneously, is a key mechanism underlying the computational efficiency of large-scale networks. This paper explores the theoretical foundations of computing in superposition, focusing on explicit, provably correct algorithms and their efficiency. We present the first lower bounds showing that for a broad class of problems, including permutations and pairwise logical operations, a neural network computing in superposition requires at least $\Omega(m' \log m')$ parameters and $\Omega(\sqrt{m' \log m'})$ neurons, where $m'$ is the number of output features being computed. This implies that any ``lottery ticket'' sparse sub-network must have at least $\Omega(m' \log m')$ parameters no matter what the initial dense network size. Conversely, we show a nearly tight upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m'} \log m')$ neurons and $O(m' \log^2 m')$ parameters. There is thus an exponential gap between computing in superposition, the subject of this work, and representing features in superposition, which can require as little as $O(\log m'$) neurons based on the Johnson-Lindenstrauss Lemma. Our hope is that our results open a path for using complexity theoretic techniques in neural network interpretability research.

翻译：近期在神经网络理解方面的进展表明，叠加态——即单个神经元同时表征多个特征的能力——是支撑大规模网络计算效率的关键机制。本文探讨了叠加态计算的理论基础，重点关注显式、可证明正确的算法及其效率。我们首次提出了下界证明：对于包括排列运算和成对逻辑运算在内的广泛问题类别，在叠加态下进行计算的神经网络至少需要 $\Omega(m' \log m')$ 个参数和 $\Omega(\sqrt{m' \log m'})$ 个神经元，其中 $m'$ 为待计算的输出特征数量。这意味着任何"彩票假设"稀疏子网络无论初始稠密网络规模如何，都必须至少具有 $\Omega(m' \log m')$ 个参数。反之，我们证明了近乎紧致的上界：诸如成对AND运算等逻辑操作可通过 $O(\sqrt{m'} \log m')$ 个神经元和 $O(m' \log^2 m')$ 个参数实现计算。因此，本文研究的叠加态计算与基于Johnson-Lindenstrauss引理仅需 $O(\log m'$) 个神经元的叠加态特征表征之间，存在指数级差距。我们期望这些研究成果能为在神经网络可解释性研究中运用复杂性理论技术开辟新路径。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日