Researchers commonly perform sentiment analysis on large collections of short texts like tweets, Reddit posts or newspaper headlines that are all focused on a specific topic, theme or event. Usually, general purpose sentiment analysis methods are used which perform well on average but miss the variation in meaning that happens across different contexts, for example, the word "active" has a very different intention and valence in the phrase "active lifestyle" versus "active volcano". This work presents a new approach, CIDER (Context Informed Dictionary and sEntiment Reasoner), which performs context sensitive sentiment analysis, where the valence of sentiment laden terms is inferred from the whole corpus before being used to score the individual texts. In this paper we detail the CIDER algorithm and demonstrate that it outperforms state-of-the-art generalist sentiment analysis on a large collection of tweets about the weather. We have made our implementation of CIDER available as a python package: https://pypi.org/project/ciderpolarity/.
翻译:研究者通常对大量聚焦于特定主题、事件或议题的短文本(如推文、Reddit帖子或新闻标题)进行情感分析。通用情感分析方法虽在平均表现上效果良好,但无法捕捉不同语境下语义的变化——例如,“active”一词在“active lifestyle”(积极生活方式)与“active volcano”(活火山)这两种短语中,其意图和情感效价存在显著差异。本研究提出一种新方法CIDER(上下文感知词典与情感推理器),该方法通过从整体语料库中推断情感负载词的效价,再将其应用于单条文本评分,从而实现上下文敏感的情感分析。本文详细阐述了CIDER算法,并证明在关于天气的大量推文数据集中,其性能优于当前最先进的通用情感分析方法。我们已将CIDER实现以Python包形式公开发布:https://pypi.org/project/ciderpolarity/。