Survivors of sexual harassment frequently share their experiences on social media, revealing their feelings and emotions and seeking advice. We observed that on Reddit, survivors regularly share long posts that describe a combination of (i) a sexual harassment incident, (ii) its effect on the survivor, including their feelings and emotions, and (iii) the advice being sought. We term such posts MeToo posts, even though they may not be so tagged and may appear in diverse subreddits. A prospective helper (such as a counselor or even a casual reader) must understand a survivor's needs from such posts. But long posts can be time-consuming to read and respond to. Accordingly, we address the problem of extracting key information from a long MeToo post. We develop a natural language-based model to identify sentences from a post that describe any of the above three categories. On ten-fold cross-validation of a dataset, our model achieves a macro F1 score of 0.82. In addition, we contribute MeThree, a dataset comprising 8,947 labeled sentences extracted from Reddit posts. We apply the LIWC-22 toolkit on MeThree to understand how different language patterns in sentences of the three categories can reveal differences in emotional tone, authenticity, and other aspects.
翻译:性骚扰幸存者经常在社交媒体上分享他们的经历,表达感受与情绪,并寻求建议。我们观察到,在Reddit上,幸存者经常发布长帖,描述(i)性骚扰事件、(ii)该事件对幸存者的影响(包括其感受与情绪),以及(iii)正在寻求的建议。我们将此类帖子称为MeToo帖子,即使它们可能未被标注,且可能出现在不同的子版块中。潜在的帮助者(如心理咨询师甚至普通读者)需要从这些帖子中理解幸存者的需求。但长帖的阅读和回复可能耗时。因此,我们解决从长MeToo帖子中提取关键信息的问题。我们开发了一个基于自然语言的模型,用于识别帖子中描述上述三类类别的句子。在数据集的十折交叉验证中,我们的模型实现了0.82的宏F1分数。此外,我们贡献了MeThree数据集,其中包含从Reddit帖子中提取的8,947个标注句子。我们使用LIWC-22工具包分析MeThree,以理解三个类别句子中不同语言模式如何揭示情感基调、真实性及其他方面的差异。