Processing information locked within clinical health records is a challenging task that remains an active area of research in biomedical NLP. In this work, we evaluate a broad set of machine learning techniques ranging from simple RNNs to specialised transformers such as BioBERT on a dataset containing clinical notes along with a set of annotations indicating whether a sample is cancer-related or not. Furthermore, we specifically employ efficient fine-tuning methods from NLP, namely, bottleneck adapters and prompt tuning, to adapt the models to our specialised task. Our evaluations suggest that fine-tuning a frozen BERT model pre-trained on natural language and with bottleneck adapters outperforms all other strategies, including full fine-tuning of the specialised BioBERT model. Based on our findings, we suggest that using bottleneck adapters in low-resource situations with limited access to labelled data or processing capacity could be a viable strategy in biomedical text mining. The code used in the experiments are going to be made available at https://github.com/omidrohanian/bottleneck-adapters.
翻译:处理临床健康记录中的加密信息是一项具有挑战性的任务,至今仍是生物医学自然语言处理领域的活跃研究方向。本研究在一个包含临床笔记及相关癌症标注(判断样本是否与癌症相关)的数据集上,评估了从简单循环神经网络到专用Transformer(如BioBERT)的广泛机器学习技术。此外,我们特别采用了自然语言处理中的高效微调方法——即瓶颈适配器和提示调优,以适配模型完成这一专门任务。实验评估表明,在自然语言预训练的冻结BERT模型上结合瓶颈适配器进行微调,其性能优于所有其他策略,包括对专用BioBERT模型进行完整微调。基于研究结果,我们建议在标注数据或计算资源有限的低资源场景中,将瓶颈适配器作为生物医学文本挖掘的可行策略。实验所用代码将在https://github.com/omidrohanian/bottleneck-adapters 公开提供。