Through exploiting a high level of parallelism enabled by graphics processing units, transformer architectures have enabled tremendous strides forward in the field of natural language processing. In a traditional masked language model, special MASK tokens are used to prompt our model to gather contextual information from surrounding words to restore originally hidden information. In this paper, we explore a task-specific masking framework for pre-trained large language models that enables superior performance on particular downstream tasks on the datasets in the GLUE benchmark. We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines. We find that Typhoon offers performance competitive with whole-word masking on the MRPC dataset. Our implementation can be found in a public Github Repository.
翻译:通过利用图形处理单元实现的高度并行性,Transformer架构在自然语言处理领域取得了巨大进步。在传统的掩码语言模型中,使用特殊的MASK标记提示模型从周围词汇中收集上下文信息,以还原原本隐藏的信息。本文探索了一种面向预训练大语言模型的任务特定掩码框架,该框架在GLUE基准测试数据集上对特定下游任务表现出卓越性能。我们基于Token输入梯度开发了自有掩码算法Typhoon,并将其与其他标准基线方法进行了比较。研究发现,Typhoon在MRPC数据集上的表现可与整词掩码相媲美。我们的实现代码已公开在GitHub仓库中。