The advancement of natural language processing (NLP) in biology hinges on models' ability to interpret intricate biomedical literature. Traditional models often struggle with the complex and domain-specific language in this field. In this paper, we present BioMamba, a pre-trained model specifically designed for biomedical text mining. BioMamba builds upon the Mamba architecture and is pre-trained on an extensive corpus of biomedical literature. Our empirical studies demonstrate that BioMamba significantly outperforms models like BioBERT and general-domain Mamba across various biomedical tasks. For instance, BioMamba achieves a 100 times reduction in perplexity and a 4 times reduction in cross-entropy loss on the BioASQ test set. We provide an overview of the model architecture, pre-training process, and fine-tuning techniques. Additionally, we release the code and trained model to facilitate further research.
翻译:生物学领域的自然语言处理(NLP)进展,关键在于模型能否解读复杂的生物医学文献。传统模型在处理该领域复杂且具有领域特定性的语言时常常面临困难。本文提出了BioMamba,一种专门为生物医学文本挖掘设计的预训练模型。BioMamba基于Mamba架构构建,并在大规模的生物医学文献语料库上进行预训练。我们的实证研究表明,BioMamba在各种生物医学任务上的表现显著优于BioBERT和通用领域的Mamba等模型。例如,在BioASQ测试集上,BioMamba实现了困惑度降低100倍,交叉熵损失降低4倍。本文概述了模型架构、预训练过程以及微调技术。此外,我们公开了代码和训练好的模型,以促进进一步的研究。