Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were trained under identical conditions and evaluated using precision, recall, and F1-score. Iterative stratification, weighted loss, and oversampling were explored to mitigate class imbalance. The mmBERT-base model achieved the best performance (micro F1 = 0.76), outperforming all other models. Iterative stratification improved class balance and overall performance. Multilingual BERT models, particularly mmBERT, perform strongly for Portuguese clinical NER and can run locally with limited computational resources. Balanced data-splitting strategies further enhance performance.
翻译:临床笔记中包含大量非结构化信息。命名实体识别能够自动提取医学概念,然而针对葡萄牙语的基准测试仍然稀缺。本研究旨在评估基于BERT的模型和大型语言模型在葡萄牙语临床NER中的应用,并测试解决多标签不平衡的策略。我们利用公开的SemClinBr语料库和私有乳腺癌数据集,比较了BioBERTpt、BERTimbau、ModernBERT、mmBERT与GPT-5、Gemini-2.5等大型语言模型的性能。各模型在相同条件下进行训练,并采用精确率、召回率和F1分数进行评估。我们探索了用于缓解类别不平衡的迭代分层、加权损失和过采样方法。mmBERT-base模型取得了最佳性能(微平均F1=0.76),优于所有其他模型。迭代分层改善了类别平衡性和整体性能,多语言BERT模型(尤其是mmBERT)在葡萄牙语临床NER中表现强劲,且可在有限计算资源下本地运行。平衡的数据分割策略进一步提升了模型性能。