BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint while maintaining accuracy. In particular, we devise a more flexible and efficient position representation method based on Attention with Linear Biases (ALiBi). We also show that replacing the method of global information represented in the BigBird with pack and unpack attention is more effective. The proposed model can work on long inputs even after being pre-trained on short inputs, and can be trained efficiently reusing existing pre-trained language model for short inputs. This is a significant benefit for low-resource languages where large amounts of long text data are difficult to obtain. As a result, our experiments show that LittleBird works very well in a variety of languages, achieving high performance in question answering tasks, particularly in KorQuAD2.0, Korean Question Answering Dataset for long paragraphs.
翻译:BERT在各種自然語言處理任務中展現了巨大成功。但由於其注意力機制,它在處理長輸入時存在局限性。Longformer、ETC和BigBird解決了這一問題,並有效消除了二次依賴問題。然而,我們發現這些模型仍不夠完善,並提出小鳥(LittleBird)——一種基於BigBird的新穎模型,在保持精度的同時提升了速度和記憶體佔用。具體而言,我們基於線性偏置注意力(ALiBi)設計了一種更靈活高效的相對位置表示方法。我們還證明,用打包與解包注意力取代BigBird中全局信息的表示方法更為有效。所提出的模型即使在短輸入上預訓練後,也能處理長輸入,並且可以高效複用針對短輸入的現有預訓練語言模型進行訓練。這對於難以獲取大量長文本數據的低資源語言而言是一個顯著優勢。實驗結果表明,小鳥在多種語言中表現出色,在問答任務(尤其是用於長段落問答的韓語數據集KorQuAD2.0)中取得了高性能。