During the current mental health crisis, the importance of identifying potential indicators of mental issues from social media content has surged. Overlooking the multifaceted nature of mental and social well-being can have detrimental effects on one's mental state. In traditional therapy sessions, professionals manually pinpoint the origins and outcomes of underlying mental challenges, a process both detailed and time-intensive. We introduce an approach to this intricate mental health analysis by framing the identification of wellness dimensions in Reddit content as a wellness concept extraction and categorization challenge. We've curated a unique dataset named WELLXPLAIN, comprising 3,092 entries and totaling 72,813 words. Drawing from Halbert L. Dunn's well-regarded wellness theory, our team formulated an annotation framework along with guidelines. This dataset also includes human-marked textual segments, offering clear reasoning for decisions made in the wellness concept categorization process. Our aim in publishing this dataset and analyzing initial benchmarks is to spearhead the creation of advanced language models tailored for healthcare-focused concept extraction and categorization.
翻译:在当前心理健康危机背景下,从社交媒体内容中识别心理问题的潜在指标的重要性日益凸显。忽视心理与社会福祉的多维性可能对个人心理状态产生不利影响。在传统治疗中,专业人员手动追溯潜在心理挑战的根源与结果,这一过程既繁琐又耗时。我们提出了一种通过将Reddit内容中的健康维度识别定义为健康概念提取与分类任务的方法,以应对这一复杂的心理健康分析问题。我们构建了一个名为WELLXPLAIN的独特数据集,包含3,092条条目,总计72,813个单词。基于Halbert L. Dunn广受认可的健康理论,我们的团队制定了标注框架与指南。该数据集还包含人工标注的文本片段,为健康概念分类过程中的决策提供了清晰的理由。我们发布该数据集并分析初始基准测试的目的,在于引领面向医疗领域概念提取与分类的高级语言模型的开发。