Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have negative impacts on the performance in large models. Inspired by this insight, we propose an intuitive idea to identify monosemantic neurons and inhibit them. However, achieving this goal is a non-trivial task as there is no unified quantitative evaluation metric and simply banning monosemantic neurons does not promote polysemanticity in neural networks. Therefore, we first propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation, then introduce a theoretically supported method to suppress monosemantic neurons and proactively promote the ratios of polysemantic neurons in training neural networks. We validate our conjecture that monosemanticity brings about performance change at different model scales on a variety of neural networks and benchmark datasets in different areas, including language, image, and physics simulation tasks. Further experiments validate our analysis and theory regarding the inhibition of monosemanticity.
翻译:近年来,随着大规模模型取得的成功,涌现现象受到了研究界的广泛关注。与现有文献不同,我们提出了一个假设:模型规模扩大过程中促进性能提升的关键因素在于能够与特定特征形成一对一关联的单义神经元的减少。单义神经元往往更为稀疏,并对大模型性能产生负面影响。受此启发,我们提出了一个直观思路:识别并抑制单义神经元。然而,实现这一目标并非易事,因为缺乏统一的量化评估标准,且简单禁止单义神经元并不能促进神经网络的多义性。为此,我们首先提出了一种新度量方法,在保证在线计算效率的前提下衡量神经元的单义性;随后引入了一种理论支持的方法,在训练神经网络时抑制单义神经元并主动提升多义神经元比例。我们在涵盖语言、图像和物理模拟任务等多个领域的多种神经网络和基准数据集上验证了关于单义性在不同模型规模下引发性能变化的猜想。进一步的实验验证了我们关于单义性抑制的分析与理论。