Recently, emergence has received widespread attention from the research community along with the success of large language models. Different from the literature, we hypothesize a key factor that highly promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have negative impacts on the performance in large models. Inspired by this insight, we propose an intuitive idea to identify monosemantic neurons and inhibit them. However, achieving this goal is a non-trivial task as there is no unified quantitative evaluation metric and simply banning monosemantic neurons does not promote polysemanticity in neural networks. Therefore, we propose to learn from emergence and present a study on proactively inhibiting the monosemantic neurons in this paper. More specifically, we first propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation, then introduce a theoretically supported method to suppress monosemantic neurons and proactively promote the ratios of polysemantic neurons in training neural networks. We validate our conjecture that monosemanticity brings about performance change at different model scales on a variety of neural networks and benchmark datasets in different areas, including language, image, and physics simulation tasks. Further experiments validate our analysis and theory regarding the inhibition of monosemanticity.
翻译:近期,随着大语言模型取得的成功,涌现现象受到了研究界的广泛关注。与现有文献不同,我们提出一个关键因素假设:在规模扩展过程中,能显著提升性能的关键在于减少仅能与特定特征形成一一对应关系的单义神经元。这类单义神经元往往更加稀疏,并在大型模型中产生负面影响。受此启发,我们提出识别并抑制单义神经元的直观思路。然而,实现这一目标并非易事,因为当前既缺乏统一的量化评估指标,又无法通过简单禁止单义神经元来促进神经网络的多义性。因此,本文提出从涌现中学习的方法,开展主动抑制单义神经元的研究。具体而言,我们首先提出一种能保证在线计算效率的新型神经元单义性度量指标,随后引入具备理论支撑的方法来抑制单义神经元,并在训练过程中主动提升多义神经元比例。通过在语言、图像及物理仿真任务等多个领域的神经网络及基准数据集上的实验,我们验证了"单义性在不同模型规模下会引发性能变化"这一猜想。进一步的实验验证了关于抑制单义性的分析与理论。