pNLP-Mixer: an Efficient all-MLP Architecture for Language

Large pre-trained language models based on transformer architecture have drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on efficient NLP has shown that weight-efficient models can attain competitive performance for simple tasks, such as slot filling and intent classification, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-efficiency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. Our quantized model achieves 99.4% and 97.8% the performance of mBERT on MTOP and multi-ATIS, while using 170x fewer parameters. Our model consistently beats the state-of-the-art of tiny models (pQRNN), which is twice as large, by a margin up to 7.8% on MTOP.

翻译：基于Transformer架构的大规模预训练语言模型已彻底改变了自然语言处理（NLP）领域。然而，将这些模型部署到智能手表等受限设备上的终端应用中，因其体积庞大和推理成本高昂而完全不切实际。作为Transformer架构的替代方案，近期关于高效NLP的研究表明，对于槽填充和意图分类等简单任务，具备权值高效性的模型能以兆字节量级的模型尺寸达到有竞争力的性能。本文提出pNLP-Mixer架构——一种面向设备端NLP的无嵌入MLP-Mixer模型，通过创新的投影层实现了高权值效率。我们在两个多语言语义解析数据集MTOP和multiATIS上评估了仅1MB大小的pNLP-Mixer模型。量化后的模型在MTOP和multiATIS上分别达到mBERT性能的99.4%和97.8%，同时参数量减少170倍。我们的模型在MTOP数据集上持续超越其两倍大小的微型模型（pQRNN）的最优结果，优势幅度高达7.8%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/