A community needs assessment is a tool used by non-profits and government agencies to quantify the strengths and issues of a community, allowing them to allocate their resources better. Such approaches are transitioning towards leveraging social media conversations to analyze the needs of communities and the assets already present within them. However, manual analysis of exponentially increasing social media conversations is challenging. There is a gap in the present literature in computationally analyzing how community members discuss the strengths and needs of the community. To address this gap, we introduce the task of identifying, extracting, and categorizing community needs and assets from conversational data using sophisticated natural language processing methods. To facilitate this task, we introduce the first dataset about community needs and assets consisting of 3,511 conversations from Reddit, annotated using crowdsourced workers. Using this dataset, we evaluate an utterance-level classification model compared to sentiment classification and a popular large language model (in a zero-shot setting), where we find that our model outperforms both baselines at an F1 score of 94% compared to 49% and 61% respectively. Furthermore, we observe through our study that conversations about needs have negative sentiments and emotions, while conversations about assets focus on location and entities. The dataset is available at https://github.com/towhidabsar/CommunityNeeds.
翻译:社区需求评估是非营利组织和政府机构用于量化社区优势与问题、以便更合理分配资源的工具。此类方法正逐渐转向利用社交媒体对话来分析社区需求及现有资产。然而,对呈指数增长的社交媒体对话进行人工分析极具挑战性。当前文献在计算分析社区成员如何讨论社区优势与需求方面存在空白。为填补这一空白,我们提出一项新任务:运用先进的自然语言处理方法从对话数据中识别、提取并分类社区需求与资产。为支撑该任务,我们首次构建了包含3,511条Reddit社区对话的数据集,并通过众包方式完成标注。利用该数据集,我们评估了一个话语级分类模型,并与情感分类及一种流行的大语言模型(在零样本场景下)进行对比,结果显示我们的模型以94%的F1分数显著优于两者(基线F1分数分别为49%和61%)。此外,研究发现关于需求的对话呈现负面情绪,而关于资产的对话则聚焦于地点与实体。数据集可于https://github.com/towhidabsar/CommunityNeeds获取。