Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, annotation can be a time-consuming and resource-intensive process, particularly when it requires domain expertise. We therefore also evaluate a possible alternative to human annotation, using large language models (LLMs) as annotators. We perform an extensive evaluation of the approach for two openly available pretrained LLMs for Norwegian, experimenting with different configurations of prompts and in-context learning, comparing their performance to human annotators. We find that even for zero-shot runs, models perform well above the baseline for binary sentiment, but still cannot compete with human annotators on the full dataset.
翻译:情感分析是汇集患者意见、定向改进医疗服务的重要工具,其前提是具备经过情感标注的领域内数据。本文记录了为挪威公共卫生研究所(NIPH)收集的患者调查自由文本评论添加情感标注的工作。然而,标注过程耗时且资源密集,尤其需要领域专业知识时更为突出。因此,我们同时评估了使用大型语言模型(LLM)作为标注者替代人工标注的可行性。针对两种公开可用的挪威语预训练LLM,我们通过实验配置不同的提示词和上下文学习模式,系统评估了该方法与人工标注者的性能差异。研究发现,即便在零样本条件下,模型在二元情感分类任务中的表现显著优于基线水平,但在完整数据集上仍无法与人工标注者匹敌。