Psychological corpora in NLP are collections of texts used to analyze human psychology, emotions, and mental health. These texts allow researchers to study psychological constructs, detect mental health issues and analyze emotional language. However, mental health data can be difficult to collect correctly from social media, due to suppositions made by the collectors. A more pragmatic strategy involves gathering data through open-ended questions and then assessing this information with self-report screening surveys. This method was employed successfully for English, a language with a lot of psychological NLP resources. However, this cannot be stated for Romanian, which currently has no open-source mental health corpus. To address this gap, we have created the first corpus for depression and anxiety in Romanian, by utilizing a form with 6 open-ended questions along with the standardized PHQ-9 and GAD-7 screening questionnaires. Consisting of the texts of 205 respondents and although it may seem small, PsihoRo is a first step towards understanding and analyzing texts regarding the mental health of the Romanian population. We employ statistical analysis, text analysis using Romanian LIWC, emotion detection and topic modeling to show what are the most important features of this newly introduced resource to the NLP community.
翻译:自然语言处理中的心理学语料库是用于分析人类心理、情绪及精神健康的文本集合。这些文本使研究者能够探究心理构念、检测心理健康问题并分析情感语言。然而,由于数据收集者的主观假设,从社交媒体正确收集心理健康数据颇具挑战。一种更务实的策略是通过开放式问题收集数据,并借助自评筛查量表评估这些信息。该方法已在英语(一种拥有丰富心理学自然语言处理资源的语言)中成功应用,但罗马尼亚语目前尚无开源心理健康语料库。为填补这一空白,我们通过包含6个开放式问题的表单及标准化的PHQ-9与GAD-7筛查问卷,创建了首个罗马尼亚语抑郁与焦虑语料库。该语料库包含205位受访者的文本,尽管规模有限,但PsihoRo为理解和分析罗马尼亚人群心理健康文本迈出了第一步。我们采用统计分析、基于罗马尼亚语LIWC的文本分析、情感检测及主题建模等方法,展示了这一新资源对自然语言处理社区最具价值的特征。