Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. Various complex methods have claimed to overcome this limit, but recent research questions the efficacy of these models across different classification tasks. These complex architectures evaluated on carefully curated long datasets perform at par or worse than simple baselines. In this work, we propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text. The proposed method is based on chunking token representations and CNN layers, making it compatible with any pre-trained BERT. We evaluate chunkBERT exclusively on a benchmark for comparing long-text classification models across a variety of tasks (including binary classification, multi-class classification, and multi-label classification). A BERT model finetuned using the ChunkBERT method performs consistently across long samples in the benchmark while utilizing only a fraction (6.25\%) of the original memory footprint. These findings suggest that efficient finetuning and inference can be achieved through simple modifications to pre-trained BERT models.
翻译:Transformer模型,特别是BERT,推动了众多自然语言处理任务的研究进展。然而,这类模型受限于最大512个词元的长度约束,导致其在长文本实际应用场景中的部署面临挑战。尽管已有多种复杂方法宣称突破该限制,但近期研究对这类模型在不同分类任务中的有效性提出了质疑。这些在精心构建的长文本数据集上评估的复杂架构,其性能与简单基线模型相当甚至更差。为此,本文提出一种名为ChunkBERT的简易扩展方案,该方案基于原始BERT架构,支持对任意预训练模型进行微调以实现超长文本推理。所提方法通过词元分块表示与CNN层实现,可兼容任何预训练BERT模型。我们仅在长文本分类模型对比基准上对ChunkBERT进行评估,涵盖二分类、多分类及多标签分类等多类任务。采用ChunkBERT方法微调的BERT模型在基准测试的长文本样本中表现稳定,且仅需原始内存占用的6.25%。研究结果表明,通过简单修改预训练BERT模型即可实现高效的微调与推理。