Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) due to their impressive performance on various tasks. However, expensive training as well as inference remains a significant impediment to their widespread applicability. While enforcing sparsity at various levels of the model architecture has found promise in addressing scaling and efficiency issues, there remains a disconnect between how sparsity affects network topology. Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology. Specifically, we exploit mechanisms seen in biological networks, such as preferential attachment and redundant synapse pruning, and show that principled, model-agnostic sparsity approaches are performant and efficient across diverse NLP tasks, spanning both classification (such as natural language inference) and generation (summarization, machine translation), despite our sole objective not being optimizing performance. NeuroPrune is competitive with (or sometimes superior to) baselines on performance and can be up to $10$x faster in terms of training time for a given level of sparsity, simultaneously exhibiting measurable improvements in inference time in many cases.
翻译:基于Transformer的语言模型因其在多种任务上的卓越表现,已成为自然语言处理(NLP)领域的主流方法。然而,昂贵的训练与推理成本仍是制约其广泛应用的重要障碍。尽管在模型架构的不同层级引入稀疏性有望解决规模扩展与效率问题,但稀疏性如何影响网络拓扑结构仍存在认知断层。受脑神经网络启发,我们从网络拓扑视角探索稀疏性方法:具体而言,通过利用生物网络中观察到的优先连接与冗余突触修剪机制,证明这种原则性、模型无关的稀疏性方法在涵盖分类(如自然语言推理)与生成(摘要、机器翻译)的多样化NLP任务中兼具高性能与高效率——尽管我们的唯一目标并非优化性能。NeuroPrune在性能上与基准方法相当(有时更优),且在给定稀疏度下训练速度可提升至$10$倍,同时多数情况下推理时间呈现可量化的改进。