Social media is a potential source of information that infers latent mental states through Natural Language Processing (NLP). While narrating real-life experiences, social media users convey their feeling of loneliness or isolated lifestyle, impacting their mental well-being. Existing literature on psychological theories points to loneliness as the major consequence of interpersonal risk factors, propounding the need to investigate loneliness as a major aspect of mental disturbance. We formulate lonesomeness detection in social media posts as an explainable binary classification problem, discovering the users at-risk, suggesting the need of resilience for early control. To the best of our knowledge, there is no existing explainable dataset, i.e., one with human-readable, annotated text spans, to facilitate further research and development in loneliness detection causing mental disturbance. In this work, three experts: a senior clinical psychologist, a rehabilitation counselor, and a social NLP researcher define annotation schemes and perplexity guidelines to mark the presence or absence of lonesomeness, along with the marking of text-spans in original posts as explanation, in 3,521 Reddit posts. We expect the public release of our dataset, LonXplain, and traditional classifiers as baselines via GitHub.
翻译:社交媒体是推断潜在心理状态的潜在信息源,可通过自然语言处理(NLP)实现。用户在叙述现实生活经历时,会传达出孤独感或与世隔绝的生活方式,从而影响其心理健康。现有心理学理论文献指出,孤独是人际风险因素的主要后果,这突显了将孤独作为心理困扰核心维度进行研究的必要性。本研究将社交媒体帖文中的孤独感检测定义为可解释的二分类问题,旨在识别高危用户,并提出早期干预的韧性需求。据我们所知,目前尚不存在具有可解释性的数据集(即包含人类可读的标注文本片段),以促进孤独感引发心理困扰的后续研究与开发。本研究由三名专家:一位资深临床心理学家、一位康复咨询师以及一位社会NLP研究员共同制定了标注方案与困惑度指南,用于标注3,521条Reddit帖文中孤独感的存在与否,同时标注原始帖文中作为解释依据的文本片段。我们计划通过GitHub公开发布LonXplain数据集及作为基线的传统分类器。