In a context where the Brazilian judiciary system, the largest in the world, faces a crisis due to the slow processing of millions of cases, it becomes imperative to develop efficient methods for analyzing legal texts. We introduce uBERT, a hybrid model that combines Transformer and Recurrent Neural Network architectures to effectively handle long legal texts. Our approach processes the full text regardless of its length while maintaining reasonable computational overhead. Our experiments demonstrate that uBERT achieves superior performance compared to BERT+LSTM when overlapping input is used and is significantly faster than ULMFiT for processing long legal documents.
翻译:鉴于巴西司法系统作为全球规模最大的司法体系,因数百万案件处理缓慢而面临危机,开发高效的法律文本分析方法显得尤为迫切。本文提出uBERT模型,该混合架构融合了Transformer与循环神经网络,能够有效处理长篇幅法律文本。我们的方法可处理任意长度的完整文本,同时保持合理的计算开销。实验结果表明,在使用重叠输入的情况下,uBERT相比BERT+LSTM模型表现出更优性能,且在处理长法律文档时显著快于ULMFiT方法。