We introduce DocPolarBERT, a layout-aware BERT model for document understanding that eliminates the need for absolute 2D positional embeddings. We extend self-attention to take into account text block positions in relative polar coordinate system rather than the Cartesian one. Despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus, DocPolarBERT achieves state-of-the-art results. These results demonstrate that a carefully designed attention mechanism can compensate for reduced pre-training data, offering an efficient and effective alternative for document understanding.
翻译:我们提出了DocPolarBERT,这是一种用于文档理解的布局感知BERT模型,它消除了对绝对二维位置嵌入的需求。我们将自注意力机制扩展为考虑文本块在相对极坐标系而非笛卡尔坐标系中的位置。尽管在比广泛使用的IIT-CDIP语料库小六倍以上的数据集上进行预训练,DocPolarBERT仍取得了最先进的结果。这些结果表明,精心设计的注意力机制可以弥补预训练数据的不足,为文档理解提供了一种高效且有效的替代方案。