In the field of dream research, the study of dream content typically relies on the analysis of verbal reports provided by dreamers upon awakening from their sleep. This task is classically performed through manual scoring provided by trained annotators, at a great time expense. While a consistent body of work suggests that natural language processing (NLP) tools can support the automatic analysis of dream reports, proposed methods lacked the ability to reason over a report's full context and required extensive data pre-processing. Furthermore, in most cases, these methods were not validated against standard manual scoring approaches. In this work, we address these limitations by adopting large language models (LLMs) to study and replicate the manual annotation of dream reports, using a mixture of off-the-shelf and bespoke approaches, with a focus on references to reports' emotions. Our results show that the off-the-shelf method achieves a low performance probably in light of inherent linguistic differences between reports collected in different (groups of) individuals. On the other hand, the proposed bespoke text classification method achieves a high performance, which is robust against potential biases. Overall, these observations indicate that our approach could find application in the analysis of large dream datasets and may favour reproducibility and comparability of results across studies.
翻译:在梦境研究领域,梦内容分析通常依赖于对做梦者醒来后所提供口头报告的分析。这一任务传统上由经过训练的人工标注者完成,耗时巨大。尽管大量研究表明自然语言处理工具可支持梦报告的自动分析,但现有方法缺乏对报告整体语境进行推理的能力,且需要大量数据预处理。此外,在多数情况下,这些方法未与标准人工评分方法进行验证。本研究通过采用大型语言模型,结合现成方法与定制方法,以报告情感指涉为重点,研究并复现了梦报告的人工标注。结果表明,现成方法性能较低,很可能源于不同(群体)个体所收集报告间的固有语言差异;而提出的定制文本分类方法则实现了高性能,且对潜在偏差具有鲁棒性。总体而言,这些观察表明我们的方法可应用于大型梦境数据集的分析,并有助于促进跨研究结果的可重复性与可比性。